Understanding Token Usage and Cost Tracking in Inferoute

Inferoute records token usage for every request you make and surfaces that data both in API responses and in your dashboard. Understanding how tokens are counted helps you predict costs, set appropriate budget limits, and identify opportunities to reduce spend. This page explains how tokens work, where to find usage data, and how to set limits to control costs.

What are tokens?

Tokens are the units LLMs use to process text. A token is roughly four characters or three-quarters of a word in English, though the exact mapping depends on the model’s tokenizer. Both the text you send (prompt tokens) and the text the model generates (completion tokens) consume tokens, and you are billed for both.

Token counts for the same text can vary across models because different providers use different tokenizers. The usage figures in each response reflect the exact token count used by the model that handled that request.

Usage in API responses

Every response from Inferoute includes a usage object in the same format as the OpenAI API. You can read this field directly from the response to track consumption per request.

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1716652800,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The boiling point of water is 100°C (212°F) at sea level."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 17,
    "total_tokens": 35
  }
}

The three fields in usage:

prompt_tokens: tokens consumed by your input (system prompt + user messages + any context)
completion_tokens: tokens generated by the model in its response
total_tokens: sum of prompt and completion tokens; this is what you are billed on

Accessing usage data

Per-request usage in code

Read the usage field directly from the response object in your application.

import openai

client = openai.OpenAI(
    base_url="https://api.inferoute.ai/v1",
    api_key="YOUR_INFEROUTE_API_KEY",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain gradient descent in one paragraph."}],
)

usage = response.usage
print(f"Prompt tokens:     {usage.prompt_tokens}")
print(f"Completion tokens: {usage.completion_tokens}")
print(f"Total tokens:      {usage.total_tokens}")

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.inferoute.ai/v1",
  apiKey: process.env.INFEROUTE_API_KEY,
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Explain gradient descent in one paragraph." }],
});

const { prompt_tokens, completion_tokens, total_tokens } = response.usage;
console.log(`Prompt tokens:     ${prompt_tokens}`);
console.log(`Completion tokens: ${completion_tokens}`);
console.log(`Total tokens:      ${total_tokens}`);

Dashboard usage views

The Inferoute dashboard provides cumulative usage views that let you analyze consumption over time. You can filter by:

API key: see which keys are driving the most usage
Model: compare consumption across GPT-4o, Claude, Gemini, and others
Provider: break down usage by upstream provider
Date range: view daily, weekly, or monthly trends

Navigate to Dashboard → Usage to access these views. You can also export usage data as CSV for use in external billing or analytics tools.

Cost calculation

Inferoute bills based on the token rates published by each provider, plus a small routing fee per request.

Request cost = (prompt_tokens × prompt_rate) + (completion_tokens × completion_rate) + routing_fee

Because prompt and completion rates differ for most models, reducing your system prompt length or tuning max_tokens can meaningfully reduce costs on high-volume workloads.

Setting usage limits

You can set token budget limits to prevent unexpected spend.

Monthly token budgets

Set a monthly token limit for your entire organization or for a specific API key in Dashboard → Settings → Usage Limits. When the limit is reached, Inferoute returns a 429 error with a budget_exceeded code until the budget resets at the start of the next calendar month.

Per-key limits

Assign individual limits to each API key. This is useful for multi-tenant applications where you want to cap how many tokens a given customer or environment can consume.

Alerts

You can configure email or webhook alerts when usage reaches a threshold (for example, 80% of your monthly budget). Configure alerts in Dashboard → Settings → Alerts.

Get Started

Core Concepts

Guides

Configuration

Support

Understanding Token Usage and Cost Tracking in Inferoute

What are tokens?

Usage in API responses

Accessing usage data

Per-request usage in code

Dashboard usage views

Cost calculation

Setting usage limits

Monthly token budgets

Per-key limits

Alerts

​What are tokens?

​Usage in API responses

​Accessing usage data

​Per-request usage in code

​Dashboard usage views

​Cost calculation

​Setting usage limits

​Monthly token budgets

​Per-key limits

​Alerts

What are tokens?

Usage in API responses

Accessing usage data

Per-request usage in code

Dashboard usage views

Cost calculation

Setting usage limits

Monthly token budgets

Per-key limits

Alerts