Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.inferoute.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

TokenHub records token usage for every request you make and surfaces that data both in API responses and in your dashboard. Understanding how tokens are counted helps you predict costs, set appropriate budget limits, and identify opportunities to reduce spend. This page explains how tokens work, where to find usage data, and how to set limits to control costs.

What are tokens?

Tokens are the units LLMs use to process text. A token is roughly four characters or three-quarters of a word in English, though the exact mapping depends on the model’s tokenizer. Both the text you send (prompt tokens) and the text the model generates (completion tokens) consume tokens, and you are billed for both.
Token counts for the same text can vary across models because different providers use different tokenizers. The usage figures in each response reflect the exact token count used by the model that handled that request.

Usage in API responses

Every response from TokenHub includes a usage object in the same format as the OpenAI API. You can read this field directly from the response to track consumption per request.
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1716652800,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The boiling point of water is 100°C (212°F) at sea level."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 17,
    "total_tokens": 35
  }
}
The three fields in usage:
  • prompt_tokens: tokens consumed by your input (system prompt + user messages + any context)
  • completion_tokens: tokens generated by the model in its response
  • total_tokens: sum of prompt and completion tokens; this is what you are billed on

Accessing usage data

Per-request usage in code

Read the usage field directly from the response object in your application.
import openai

client = openai.OpenAI(
    base_url="https://api.tokenhub.ai/v1",
    api_key="YOUR_TOKENHUB_API_KEY",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain gradient descent in one paragraph."}],
)

usage = response.usage
print(f"Prompt tokens:     {usage.prompt_tokens}")
print(f"Completion tokens: {usage.completion_tokens}")
print(f"Total tokens:      {usage.total_tokens}")

Dashboard usage views

The TokenHub dashboard provides cumulative usage views that let you analyze consumption over time. You can filter by:
  • API key: see which keys are driving the most usage
  • Model: compare consumption across GPT-4o, Claude, Gemini, and others
  • Provider: break down usage by upstream provider
  • Date range: view daily, weekly, or monthly trends
Navigate to Dashboard → Usage to access these views. You can also export usage data as CSV for use in external billing or analytics tools.

Cost calculation

TokenHub bills based on the token rates published by each provider, plus a small routing fee per request.
Request cost = (prompt_tokens × prompt_rate) + (completion_tokens × completion_rate) + routing_fee
Because prompt and completion rates differ for most models, reducing your system prompt length or tuning max_tokens can meaningfully reduce costs on high-volume workloads.

Setting usage limits

You can set token budget limits to prevent unexpected spend.

Monthly token budgets

Set a monthly token limit for your entire organization or for a specific API key in Dashboard → Settings → Usage Limits. When the limit is reached, TokenHub returns a 429 error with a budget_exceeded code until the budget resets at the start of the next calendar month.

Per-key limits

Assign individual limits to each API key. This is useful for multi-tenant applications where you want to cap how many tokens a given customer or environment can consume.

Alerts

You can configure email or webhook alerts when usage reaches a threshold (for example, 80% of your monthly budget). Configure alerts in Dashboard → Settings → Alerts.