The embeddings endpoint converts text into high-dimensional numeric vectors that capture semantic meaning. You can use these vectors to build retrieval-augmented generation (RAG) pipelines, power semantic search, cluster documents by topic, or classify text without fine-tuning.Documentation Index
Fetch the complete documentation index at: https://docs.inferoute.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Endpoint
Request parameters
The embedding model to use. For example:
text-embedding-3-small or text-embedding-3-large. Retrieve the full list of available embedding models from GET /v1/models.The text to embed. Pass a single string for one embedding or an array of strings to embed multiple inputs in a single request. Arrays are more efficient than making one request per string.
Format for the returned vectors. Use
"float" for a JSON array of numbers (default), or "base64" for a base64-encoded binary representation that reduces response size.Number of dimensions in the output embedding. Only supported by models that accept a dimensions parameter (e.g.,
text-embedding-3-small, text-embedding-3-large). Reducing dimensions trades some accuracy for lower storage and compute cost.Response fields
Always
"list".The model that generated the embeddings.
Examples
Single string input
Array input (batch embedding)
Common use cases
- RAG pipelines — embed your documents at index time, embed user queries at runtime, and retrieve the closest chunks by cosine similarity before passing them to a chat completions request.
- Semantic search — find documents that are conceptually similar to a query even when no keywords match.
- Document clustering — group large collections of text by topic without labeled training data.
- Classification — train a lightweight classifier on top of embeddings rather than fine-tuning a full model.