TokenHub exposes an OpenAI-compatible REST API that lets you send inference requests to multiple LLM providers through a single unified endpoint. Every request goes toDocumentation Index
Fetch the complete documentation index at: https://docs.inferoute.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
https://api.tokenhub.ai/v1, accepts JSON, and returns JSON — so any tooling that works with the OpenAI API works with TokenHub as well.
Base URL
Authentication
All requests require a Bearer token in theAuthorization header:
Available endpoints
| Method | Endpoint | Description |
|---|---|---|
POST | /v1/chat/completions | Generate chat responses from a messages array |
POST | /v1/completions | Generate text completions from a prompt string |
POST | /v1/embeddings | Generate vector embeddings for text input |
GET | /v1/models | List all available models and their capabilities |
Chat completions
Send a messages array and receive an assistant reply. Supports streaming, function calling, and all OpenAI-compatible parameters.
Completions
Legacy text completion endpoint. Accepts a prompt string and returns generated text.
Embeddings
Generate vector representations of text for semantic search, RAG pipelines, and classification.
Models
Retrieve the full list of models available on TokenHub, including provider, context window, and capabilities.
OpenAI SDK compatibility
Because TokenHub implements the OpenAI API spec, you can use any OpenAI SDK by pointingbase_url at TokenHub:
TokenHub-specific headers
TokenHub extends the OpenAI request/response contract with a small set of headers for routing control and observability.Request headers
| Header | Values | Description |
|---|---|---|
X-Inferoute-Strategy | cost, latency, availability, round-robin | Routing strategy for this request |
X-Inferoute-Fallback | Comma-separated model IDs | Ordered list of fallback models if the primary is unavailable |
Response headers
| Header | Description |
|---|---|
X-Inferoute-Provider | The provider that served the request (e.g., openai, anthropic) |
X-Inferoute-Request-Id | Unique request identifier for debugging and support |
X-Inferoute-Latency-Ms | End-to-end request latency in milliseconds |