Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.inferoute.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

TokenHub gives you access to dozens of models across OpenAI, Anthropic, Google, Mistral, and other providers through a single API. You can specify a model by its full provider-prefixed name, use a TokenHub alias that abstracts the provider away, or let TokenHub choose automatically based on your routing strategy.

How model selection works

There are three ways to specify a model in your request:
  1. Provider-prefixed name — explicitly targets a specific model from a specific provider: openai/gpt-4o, anthropic/claude-3-5-sonnet, google/gemini-1.5-pro
  2. TokenHub alias — a stable identifier like premium, balanced, or economy that maps to the best available model in that tier at request time
  3. Auto-routing — omit the model field entirely and set X-Inferoute-Strategy: cost or X-Inferoute-Strategy: latency to let TokenHub decide

Model categories

Premium

Highest quality for complex reasoning, long-form writing, and multi-step tasks. Best when output quality directly affects user experience.
  • openai/gpt-4o
  • anthropic/claude-3-5-sonnet
  • Alias: premium

Balanced

Strong capability at moderate cost. A good default for most production workloads where you want solid quality without paying for top-tier.
  • openai/gpt-4o-mini
  • anthropic/claude-3-haiku
  • Alias: balanced

Economy

Fastest and cheapest. Ideal for classification, extraction, summarization, and any task where throughput matters more than nuance.
  • openai/gpt-4o-mini
  • anthropic/claude-3-haiku
  • google/gemini-flash
  • Alias: economy

Long context

For documents, codebases, and conversation histories that exceed standard context windows.
  • anthropic/claude-3-5-sonnet (200k tokens)
  • google/gemini-1.5-pro (1M tokens)
  • Alias: long-context
Use caseRecommended modelWhy
Complex reasoning, analysisopenai/gpt-4o, anthropic/claude-3-5-sonnetHighest instruction-following and reasoning accuracy
Chatbots, customer supportopenai/gpt-4o-mini, anthropic/claude-3-haikuLow latency, cost-effective at high volume
Document summarizationanthropic/claude-3-5-sonnet, google/gemini-1.5-proLong context windows handle full documents
Classification / labelingopenai/gpt-4o-mini, google/gemini-flashSimple tasks don’t need expensive models
Embeddings (search, RAG)openai/text-embedding-3-small, openai/text-embedding-3-largePurpose-built for vector representations
Code generationopenai/gpt-4o, anthropic/claude-3-5-sonnetStrong code comprehension and generation

Model tier comparison

Premium models deliver the best output quality across reasoning, writing, and instruction-following. Use them when accuracy and response quality are critical.Models: openai/gpt-4o, anthropic/claude-3-5-sonnetAlias: premium
python
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {"role": "user", "content": "Analyze the trade-offs of microservices vs monolithic architecture for a startup with 5 engineers."},
    ],
)
node.js
const response = await client.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [
    {
      role: "user",
      content:
        "Analyze the trade-offs of microservices vs monolithic architecture for a startup with 5 engineers.",
    },
  ],
});
Typical cost: ~55–15 per 1M input tokens

Provider-agnostic code with aliases

When you hardcode a provider-prefixed model name like openai/gpt-4o, swapping providers requires a code change. TokenHub aliases let you decouple your application from any specific provider.
python
# Hardcoded — requires code change to switch providers
model = "openai/gpt-4o"

# Alias — TokenHub resolves to the best model in this tier
# You can update the alias mapping in the dashboard without touching code
model = "premium"
You can configure what each alias resolves to in the Model Aliases section of your dashboard. This lets you A/B test new models, react to provider outages, or optimize costs — all without deploying code changes.

Embeddings

For embeddings, specify the model directly. TokenHub passes the request to the provider without modification.
python
response = client.embeddings.create(
    model="openai/text-embedding-3-small",
    input="TokenHub routes your LLM requests intelligently.",
)

embedding = response.data[0].embedding
print(f"Embedding dimensions: {len(embedding)}")  # 1536 for text-embedding-3-small
ModelDimensionsBest for
openai/text-embedding-3-small1536High-volume search, RAG pipelines where cost matters
openai/text-embedding-3-large3072Highest accuracy retrieval, semantic similarity