Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.inferoute.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

TokenHub is OpenAI-compatible, so you can start routing requests through it using tools you already have. This guide walks you through creating an account, generating an API key, and making your first chat completion request.
1

Sign up for TokenHub

Create a free account at tokenhub.ai. After signing up, you’ll land in the TokenHub dashboard where you can manage API keys, view usage, and configure routing.
2

Generate an API key

In the dashboard, go to Settings → API Keys and click New API key. Give it a descriptive name (for example, dev-local) and optionally set an expiry date. Copy the key immediately — it won’t be displayed again.
Store your API key securely. Never commit it to version control or share it in public channels. Use an environment variable instead of hardcoding it in your application.
3

Install the OpenAI SDK (optional)

TokenHub works with the standard OpenAI SDK. If you prefer to use raw HTTP, skip this step.
pip install openai
4

Configure the base URL and API key

Point the SDK or your HTTP client at the TokenHub API endpoint. Replace YOUR_TOKENHUB_API_KEY with the key you generated.
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.tokenhub.ai/v1",
    api_key=os.environ["TOKENHUB_API_KEY"],
)
5

Make a chat completion request

Send a chat completion request exactly as you would with the OpenAI API.
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.tokenhub.ai/v1",
    api_key=os.environ["TOKENHUB_API_KEY"],
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "What is AI inference routing?"}
    ],
)

print(response.choices[0].message.content)
A successful response looks like this:
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1748131200,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "AI inference routing is the practice of directing model requests to different LLM providers based on criteria like cost, latency, and availability."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 32,
    "total_tokens": 46
  }
}
You can use any OpenAI-compatible model name in the model field — including Anthropic, Google, and Mistral models. TokenHub maps the model name to the appropriate provider automatically. See the models reference for the full list of supported model identifiers.
Every request you make appears in the TokenHub dashboard under Usage. From there you can monitor token consumption, estimated costs, and which providers handled each request.