TokenHub gives you access to dozens of models across OpenAI, Anthropic, Google, Mistral, and other providers through a single API. You can specify a model by its full provider-prefixed name, use a TokenHub alias that abstracts the provider away, or let TokenHub choose automatically based on your routing strategy.Documentation Index
Fetch the complete documentation index at: https://docs.inferoute.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
How model selection works
There are three ways to specify a model in your request:- Provider-prefixed name — explicitly targets a specific model from a specific provider:
openai/gpt-4o,anthropic/claude-3-5-sonnet,google/gemini-1.5-pro - TokenHub alias — a stable identifier like
premium,balanced, oreconomythat maps to the best available model in that tier at request time - Auto-routing — omit the
modelfield entirely and setX-Inferoute-Strategy: costorX-Inferoute-Strategy: latencyto let TokenHub decide
Model categories
Premium
Highest quality for complex reasoning, long-form writing, and multi-step tasks. Best when output quality directly affects user experience.
openai/gpt-4oanthropic/claude-3-5-sonnet- Alias:
premium
Balanced
Strong capability at moderate cost. A good default for most production workloads where you want solid quality without paying for top-tier.
openai/gpt-4o-minianthropic/claude-3-haiku- Alias:
balanced
Economy
Fastest and cheapest. Ideal for classification, extraction, summarization, and any task where throughput matters more than nuance.
openai/gpt-4o-minianthropic/claude-3-haikugoogle/gemini-flash- Alias:
economy
Long context
For documents, codebases, and conversation histories that exceed standard context windows.
anthropic/claude-3-5-sonnet(200k tokens)google/gemini-1.5-pro(1M tokens)- Alias:
long-context
Recommended models by use case
| Use case | Recommended model | Why |
|---|---|---|
| Complex reasoning, analysis | openai/gpt-4o, anthropic/claude-3-5-sonnet | Highest instruction-following and reasoning accuracy |
| Chatbots, customer support | openai/gpt-4o-mini, anthropic/claude-3-haiku | Low latency, cost-effective at high volume |
| Document summarization | anthropic/claude-3-5-sonnet, google/gemini-1.5-pro | Long context windows handle full documents |
| Classification / labeling | openai/gpt-4o-mini, google/gemini-flash | Simple tasks don’t need expensive models |
| Embeddings (search, RAG) | openai/text-embedding-3-small, openai/text-embedding-3-large | Purpose-built for vector representations |
| Code generation | openai/gpt-4o, anthropic/claude-3-5-sonnet | Strong code comprehension and generation |
Model tier comparison
Provider-agnostic code with aliases
When you hardcode a provider-prefixed model name likeopenai/gpt-4o, swapping providers requires a code change. TokenHub aliases let you decouple your application from any specific provider.
python
Embeddings
For embeddings, specify the model directly. TokenHub passes the request to the provider without modification.python
| Model | Dimensions | Best for |
|---|---|---|
openai/text-embedding-3-small | 1536 | High-volume search, RAG pipelines where cost matters |
openai/text-embedding-3-large | 3072 | Highest accuracy retrieval, semantic similarity |