How Inferoute Intelligently Routes Your AI Requests

Inferoute’s routing engine sits at the core of every API call you make. Instead of sending requests to a single hardcoded provider, Inferoute evaluates your routing strategy and the current state of all connected providers, then forwards your request to the option that best matches your requirements. This happens transparently — your code stays the same regardless of which provider handles the work.

Routing strategies

You can control how Inferoute selects a provider by specifying a routing strategy. Each strategy optimizes for a different dimension of performance.

Latency-optimized
Cost-optimized
Availability
Round-robin

Inferoute routes to the provider with the lowest measured response time for the requested model at the moment of the call. This is the best choice for interactive applications where response speed is critical.

import openai

client = openai.OpenAI(
    base_url="https://api.inferoute.ai/v1",
    api_key="YOUR_INFEROUTE_API_KEY",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize this document."}],
    extra_headers={"X-Inferoute-Strategy": "latency"},
)

Inferoute selects the provider offering the lowest combined prompt and completion token price for the requested model. Use this strategy for batch workloads, background jobs, or any task where a few extra milliseconds of latency is acceptable.

import openai

client = openai.OpenAI(
    base_url="https://api.inferoute.ai/v1",
    api_key="YOUR_INFEROUTE_API_KEY",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Classify these 1000 support tickets."}],
    extra_headers={"X-Inferoute-Strategy": "cost"},
)

Inferoute routes to the provider with the highest current uptime and lowest observed error rate. Use this when consistency matters more than raw speed or price.

import openai

client = openai.OpenAI(
    base_url="https://api.inferoute.ai/v1",
    api_key="YOUR_INFEROUTE_API_KEY",
)

response = client.chat.completions.create(
    model="claude-3-5-sonnet",
    messages=[{"role": "user", "content": "Generate a contract draft."}],
    extra_headers={"X-Inferoute-Strategy": "availability"},
)

Inferoute distributes requests evenly across all available providers for the requested model. This spreads load and provides a natural form of redundancy without prioritizing any single dimension.

import openai

client = openai.OpenAI(
    base_url="https://api.inferoute.ai/v1",
    api_key="YOUR_INFEROUTE_API_KEY",
)

response = client.chat.completions.create(
    model="gemini-1.5-pro",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    extra_headers={"X-Inferoute-Strategy": "round-robin"},
)

If you do not specify X-Inferoute-Strategy, Inferoute uses the balanced strategy by default. Balanced weighs latency, cost, and availability together to make a reasonable choice for most workloads.

Specifying routing preferences

You have two ways to communicate your routing preference to Inferoute.

Via the model parameter

You can embed the strategy directly in the model name using a routing suffix. This works with any OpenAI-compatible client without modifying headers.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.inferoute.ai/v1",
  apiKey: process.env.INFEROUTE_API_KEY,
});

// Route to the cheapest available GPT-4o endpoint
const response = await client.chat.completions.create({
  model: "gpt-4o:cost",
  messages: [{ role: "user", content: "Draft a product description." }],
});

Supported suffixes: :latency, :cost, :availability, :round-robin.

Via the X-Inferoute-Strategy header

Pass the strategy as a custom request header. This keeps your model names clean and lets you change strategy at the request level without altering model identifiers.

curl https://api.inferoute.ai/v1/chat/completions \
  -H "Authorization: Bearer $INFEROUTE_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-Inferoute-Strategy: latency" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Fallback behavior

Inferoute automatically retries failed requests on alternative providers. If the primary provider returns an error or times out, Inferoute selects the next best option according to your strategy and retries the request — without any additional code on your side. Fallback behavior covers:

Provider-side 5xx errors
Request timeouts
Rate limit responses (429s) when no retry window is available

The retry chain continues until a provider returns a successful response or all eligible providers for that model are exhausted. If all providers fail, Inferoute returns an error with details about each attempted provider.

Get Started

Core Concepts

Guides

Configuration

Support

How Inferoute Intelligently Routes Your AI Requests

Routing strategies

Specifying routing preferences

Via the model parameter

Via the X-Inferoute-Strategy header

Fallback behavior

​Routing strategies

​Specifying routing preferences

​Via the model parameter

​Via the X-Inferoute-Strategy header

​Fallback behavior

Routing strategies

Specifying routing preferences

Via the model parameter

Via the X-Inferoute-Strategy header

Fallback behavior