Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.inferoute.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

TokenHub uses the same model parameter you already know from the OpenAI API. You pass a model name in your request, and TokenHub resolves it to the appropriate provider and model version. You can be as specific as you want — pinning a request to a particular provider and model — or as general as you want, delegating the selection to TokenHub entirely.
Use short aliases like gpt-4o and claude-3-5-sonnet instead of provider-prefixed names like openai/gpt-4o. Aliases keep your code provider-agnostic, making it easy to swap providers without modifying requests.

Model naming formats

TokenHub supports three model naming formats.

Provider-prefixed names

Fully qualified names that pin a request to a specific provider. Use this format when you need to guarantee which provider handles the request.
openai/gpt-4o
anthropic/claude-3-5-sonnet
google/gemini-1.5-pro
mistral/mistral-large
meta/llama-3.1-70b

Short aliases

Shorthand names that map to a canonical model across providers. TokenHub resolves the alias to the best available endpoint for that model.
AliasResolves to
gpt-4oOpenAI GPT-4o
gpt-4OpenAI GPT-4
gpt-3.5-turboOpenAI GPT-3.5 Turbo
claude-3-5-sonnetAnthropic Claude 3.5 Sonnet
claude-3-opusAnthropic Claude 3 Opus
claude-3-haikuAnthropic Claude 3 Haiku
gemini-1.5-proGoogle Gemini 1.5 Pro
gemini-1.5-flashGoogle Gemini 1.5 Flash
mistral-largeMistral Large

Auto selection

Setting model to "auto" tells TokenHub to pick the most suitable model for your request based on its content, your active routing strategy, and current provider availability.
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Write a Python function to parse CSV files."}],
)

Code examples

The following examples show how to use each naming format in a standard chat completion call.
import openai

client = openai.OpenAI(
    base_url="https://api.tokenhub.ai/v1",
    api_key="YOUR_TOKENHUB_API_KEY",
)

# Provider-prefixed: always uses Anthropic
response = client.chat.completions.create(
    model="anthropic/claude-3-5-sonnet",
    messages=[{"role": "user", "content": "What is the boiling point of water?"}],
)

# Short alias: provider-agnostic
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is the boiling point of water?"}],
)

# Auto: TokenHub decides
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "What is the boiling point of water?"}],
)

print(response.choices[0].message.content)

Model capabilities

Different models support different features. The table below summarizes key capabilities for the models available on TokenHub.
ModelContext windowMultimodal inputMax output tokens
GPT-4o128K tokensImages, audio16K tokens
GPT-4128K tokensImages8K tokens
GPT-3.5 Turbo16K tokens4K tokens
Claude 3.5 Sonnet200K tokensImages, documents8K tokens
Claude 3 Opus200K tokensImages, documents4K tokens
Claude 3 Haiku200K tokensImages4K tokens
Gemini 1.5 Pro1M tokensImages, video, audio8K tokens
Gemini 1.5 Flash1M tokensImages, video, audio8K tokens
Mistral Large128K tokens4K tokens
Llama 3.1 70B128K tokens4K tokens
To list all currently available models and their capabilities programmatically, use the GET /v1/models endpoint.