Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.inferoute.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

TokenHub gives you access to the leading LLM providers through one unified endpoint. Rather than managing separate API keys, SDKs, and request formats for each provider, you send all your requests to TokenHub’s OpenAI-compatible API and let the platform handle provider authentication, request translation, and response normalization. Your application code stays consistent regardless of which provider processes a given request.
Provider pricing is passed through at cost with a small TokenHub routing fee added on top. You pay the provider’s published token rates — TokenHub does not mark up model prices.

Supported providers

ProviderModelsChatCompletionsEmbeddings
OpenAIGPT-4o, GPT-4, GPT-3.5 Turbo
AnthropicClaude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku
GoogleGemini 1.5 Pro, Gemini 1.5 Flash
MistralMistral Large, Mistral Medium, Mistral Small
Meta (hosted)Llama 3.1 (8B, 70B, 405B)

OpenAI

TokenHub supports the full OpenAI model lineup including the latest GPT-4o for multimodal tasks, GPT-4 for high-accuracy text tasks, and GPT-3.5 Turbo for fast, cost-efficient workloads. OpenAI models are also available for text embeddings.

Anthropic

Claude models are available for chat and instruction-following tasks. Claude 3.5 Sonnet offers the best balance of speed and reasoning quality. Claude 3 Opus is Anthropic’s most capable model for complex analysis. Claude 3 Haiku is optimized for low-latency, high-volume use cases.

Google

Gemini 1.5 Pro supports long-context inputs (up to 1M tokens) and multimodal inputs including images and documents. Gemini 1.5 Flash is a faster, lighter-weight variant suited for latency-sensitive applications.

Mistral

Mistral models are European-hosted and offer strong multilingual performance. Mistral Large is the flagship model; Mistral Medium and Mistral Small trade capability for lower cost and higher throughput.

Meta Llama (hosted)

Meta’s Llama 3.1 models are available through TokenHub via third-party hosting partners. You access them through the same TokenHub endpoint — no need to provision your own hosting infrastructure.

Authentication and credentials

You do not need a separate account with each provider. TokenHub manages provider credentials on your behalf. All you need is a single TokenHub API key, which you pass as the Authorization header in every request.
curl https://api.tokenhub.ai/v1/chat/completions \
  -H "Authorization: Bearer $TOKENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet",
    "messages": [{"role": "user", "content": "Explain transformer architecture."}]
  }'
TokenHub securely stores and rotates provider credentials. Provider keys are never exposed in API responses or logs.

Real-time availability monitoring

TokenHub continuously monitors the health of every provider. Metrics tracked include:
  • Uptime: whether the provider’s API is returning successful responses
  • Latency: current p50 and p95 response times per model
  • Error rate: percentage of requests returning 5xx or timeout errors
When a provider experiences degraded performance or an outage, TokenHub’s routing engine automatically deprioritizes or bypasses it. If you have the availability or latency strategy active, your requests are rerouted to healthy providers without any action on your part. You can view current provider status at status.tokenhub.ai.