Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.inferoute.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

The embeddings endpoint converts text into high-dimensional numeric vectors that capture semantic meaning. You can use these vectors to build retrieval-augmented generation (RAG) pipelines, power semantic search, cluster documents by topic, or classify text without fine-tuning.

Endpoint

POST https://api.tokenhub.ai/v1/embeddings

Request parameters

model
string
required
The embedding model to use. For example: text-embedding-3-small or text-embedding-3-large. Retrieve the full list of available embedding models from GET /v1/models.
input
string | string[]
required
The text to embed. Pass a single string for one embedding or an array of strings to embed multiple inputs in a single request. Arrays are more efficient than making one request per string.
encoding_format
string
default:"float"
Format for the returned vectors. Use "float" for a JSON array of numbers (default), or "base64" for a base64-encoded binary representation that reduces response size.
dimensions
integer
Number of dimensions in the output embedding. Only supported by models that accept a dimensions parameter (e.g., text-embedding-3-small, text-embedding-3-large). Reducing dimensions trades some accuracy for lower storage and compute cost.

Response fields

object
string
Always "list".
data
object[]
model
string
The model that generated the embeddings.
usage
object

Examples

Single string input

curl https://api.tokenhub.ai/v1/embeddings \
  --request POST \
  --header "Authorization: Bearer YOUR_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "text-embedding-3-small",
    "input": "TokenHub routes your LLM requests intelligently."
  }'
Response:
{
  "object": "list",
  "data": [
    {
      "index": 0,
      "object": "embedding",
      "embedding": [0.0023064255, -0.009327292, 0.015797043, "..."]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 9
  }
}

Array input (batch embedding)

curl https://api.tokenhub.ai/v1/embeddings \
  --request POST \
  --header "Authorization: Bearer YOUR_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "text-embedding-3-small",
    "input": [
      "How do I reset my password?",
      "Where can I find my invoice?",
      "How do I upgrade my plan?"
    ]
  }'
Response:
{
  "object": "list",
  "data": [
    {"index": 0, "object": "embedding", "embedding": [0.0023, -0.0093, "..."]},
    {"index": 1, "object": "embedding", "embedding": [-0.0041, 0.0187, "..."]},
    {"index": 2, "object": "embedding", "embedding": [0.0112, -0.0034, "..."]}
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 27,
    "total_tokens": 27
  }
}

Common use cases

  • RAG pipelines — embed your documents at index time, embed user queries at runtime, and retrieve the closest chunks by cosine similarity before passing them to a chat completions request.
  • Semantic search — find documents that are conceptually similar to a query even when no keywords match.
  • Document clustering — group large collections of text by topic without labeled training data.
  • Classification — train a lightweight classifier on top of embeddings rather than fine-tuning a full model.
Batch your inputs into a single request whenever possible. Sending 100 strings in one array call is significantly faster and cheaper than making 100 individual requests.