Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.inferoute.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

TokenHub exposes an OpenAI-compatible REST API that lets you send inference requests to multiple LLM providers through a single unified endpoint. Every request goes to https://api.tokenhub.ai/v1, accepts JSON, and returns JSON — so any tooling that works with the OpenAI API works with TokenHub as well.

Base URL

https://api.tokenhub.ai/v1

Authentication

All requests require a Bearer token in the Authorization header:
Authorization: Bearer YOUR_API_KEY
See Authentication for full details and code examples.

Available endpoints

MethodEndpointDescription
POST/v1/chat/completionsGenerate chat responses from a messages array
POST/v1/completionsGenerate text completions from a prompt string
POST/v1/embeddingsGenerate vector embeddings for text input
GET/v1/modelsList all available models and their capabilities

Chat completions

Send a messages array and receive an assistant reply. Supports streaming, function calling, and all OpenAI-compatible parameters.

Completions

Legacy text completion endpoint. Accepts a prompt string and returns generated text.

Embeddings

Generate vector representations of text for semantic search, RAG pipelines, and classification.

Models

Retrieve the full list of models available on TokenHub, including provider, context window, and capabilities.

OpenAI SDK compatibility

Because TokenHub implements the OpenAI API spec, you can use any OpenAI SDK by pointing base_url at TokenHub:
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_TOKENHUB_API_KEY",
    base_url="https://api.tokenhub.ai/v1",
)

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

TokenHub-specific headers

TokenHub extends the OpenAI request/response contract with a small set of headers for routing control and observability.

Request headers

HeaderValuesDescription
X-Inferoute-Strategycost, latency, availability, round-robinRouting strategy for this request
X-Inferoute-FallbackComma-separated model IDsOrdered list of fallback models if the primary is unavailable

Response headers

HeaderDescription
X-Inferoute-ProviderThe provider that served the request (e.g., openai, anthropic)
X-Inferoute-Request-IdUnique request identifier for debugging and support
X-Inferoute-Latency-MsEnd-to-end request latency in milliseconds