The chat completions endpoint is the primary way to interact with language models on TokenHub. You send aDocumentation Index
Fetch the complete documentation index at: https://docs.inferoute.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
messages array representing a conversation, and the API returns the next assistant message. The request body is fully OpenAI-compatible — any payload that works with the OpenAI Chat Completions API works here.
Endpoint
Request parameters
Body
The model to use for this request. You can use the short name (
gpt-4o) or the provider-prefixed name (openai/gpt-4o, anthropic/claude-3-5-sonnet). Retrieve the full list of available IDs from GET /v1/models.The conversation history as an array of message objects. Each object must include
role (system, user, or assistant) and content (string).Maximum number of tokens to generate in the response. Defaults to the model’s maximum output.
Sampling temperature between
0 and 2. Lower values produce more deterministic output; higher values produce more varied output.When
true, the response is streamed as server-sent events (SSE). Each event contains a partial delta object. The stream ends with data: [DONE].Nucleus sampling parameter. The model considers only the tokens comprising the top
top_p probability mass.Number of completion choices to generate. Each choice is an independent generation.
One or more sequences where the model stops generating. The stop sequence itself is not included in the output.
Headers
Routing strategy for this request. Accepted values:
cost, latency, availability, round-robin. Defaults to your account’s configured strategy.Comma-separated list of fallback model IDs to try if the primary model is unavailable. Example:
anthropic/claude-3-5-sonnet,openai/gpt-4-turbo.Response fields
Unique identifier for this completion request.
Always
"chat.completion".The model that actually served the request. May differ from your requested model when a fallback was used.
Response headers
| Header | Description |
|---|---|
X-Inferoute-Provider | The provider that served the request (e.g., openai, anthropic) |
X-Inferoute-Request-Id | Unique request ID for support and debugging |
X-Inferoute-Latency-Ms | End-to-end request latency in milliseconds |
Examples
Basic request and response
Routing strategy and fallback
curl
Streaming
Python
data: [DONE].