Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.inferoute.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

TokenHub uses standard HTTP status codes to indicate the outcome of every request. Errors always return a JSON body so your application can parse the failure reason programmatically and react accordingly.

Error response format

All error responses share this structure:
{
  "error": {
    "type": "authentication_error",
    "message": "Invalid API key provided.",
    "code": "invalid_api_key"
  }
}
FieldTypeDescription
typestringBroad error category
messagestringHuman-readable description of the error
codestringMachine-readable error code for programmatic handling

Error codes

StatusTypeWhen it occurs
400 Bad Requestinvalid_request_errorMalformed JSON body or missing required fields
401 Unauthorizedauthentication_errorMissing or invalid API key
402 Payment Requiredusage_limit_exceededMonthly usage limit has been reached
404 Not Foundnot_found_errorModel not found or endpoint path is invalid
422 Unprocessable Entityinvalid_request_errorValid JSON but parameter values fail validation (e.g., temperature out of range)
429 Too Many Requestsrate_limit_errorRequest rate limit exceeded
500 Internal Server Errorapi_errorInternal TokenHub error or unexpected provider failure
503 Service Unavailableprovider_errorAll configured providers (including fallbacks) are unavailable

Retry logic

Not all errors are worth retrying. Use this guidance to decide: Retry these errors — they are transient and typically resolve automatically:
  • 429 Too Many Requests — back off and retry after the interval in the Retry-After header
  • 500 Internal Server Error — retry with exponential backoff
  • 503 Service Unavailable — retry with exponential backoff; consider adding fallback models via X-Inferoute-Fallback
Do not retry these errors — they indicate a problem with the request itself:
  • 400 Bad Request — fix the request body before retrying
  • 401 Unauthorized — provide a valid API key
  • 402 Payment Required — upgrade your plan or wait for the billing period to reset
  • 404 Not Found — check the model ID and endpoint path
  • 422 Unprocessable Entity — correct the invalid parameter values

Exponential backoff example

import time
import random
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.tokenhub.ai/v1",
)

RETRYABLE_STATUS_CODES = {429, 500, 503}
MAX_RETRIES = 5

def chat_with_backoff(messages, model="openai/gpt-4o"):
    delay = 1.0
    for attempt in range(MAX_RETRIES):
        try:
            return client.chat.completions.create(
                model=model,
                messages=messages,
            )
        except Exception as e:
            status = getattr(e, "status_code", None)
            if status not in RETRYABLE_STATUS_CODES or attempt == MAX_RETRIES - 1:
                raise
            jitter = random.uniform(0, delay * 0.1)
            print(f"Attempt {attempt + 1} failed with {status}. Retrying in {delay:.1f}s...")
            time.sleep(delay + jitter)
            delay = min(delay * 2, 60)  # cap at 60 seconds

response = chat_with_backoff([{"role": "user", "content": "Hello!"}])
print(response.choices[0].message.content)