Get Started with Inferoute

Inferoute is OpenAI-compatible, so you can start routing requests through it using tools you already have. This guide walks you through creating an account, generating an API key, and making your first chat completion request.

Create a free account at inferoute.ai. After signing up, you’ll land in the Inferoute dashboard where you can manage API keys, view usage, and configure routing.

Generate an API key

In the dashboard, go to Settings → API Keys and click New API key. Give it a descriptive name (for example, dev-local) and optionally set an expiry date. Copy the key immediately — it won’t be displayed again.

Store your API key securely. Never commit it to version control or share it in public channels. Use an environment variable instead of hardcoding it in your application.

Install the OpenAI SDK (optional)

Inferoute works with the standard OpenAI SDK. If you prefer to use raw HTTP, skip this step.

pip install openai

npm install openai

Configure the base URL and API key

Point the SDK or your HTTP client at the Inferoute API endpoint. Replace YOUR_INFEROUTE_API_KEY with the key you generated.

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.inferoute.ai/v1",
    api_key=os.environ["INFEROUTE_API_KEY"],
)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.inferoute.ai/v1",
  apiKey: process.env.INFEROUTE_API_KEY,
});

Make a chat completion request

Send a chat completion request exactly as you would with the OpenAI API.

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.inferoute.ai/v1",
    api_key=os.environ["INFEROUTE_API_KEY"],
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "What is AI inference routing?"}
    ],
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.inferoute.ai/v1",
  apiKey: process.env.INFEROUTE_API_KEY,
});

const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [
    { role: "user", content: "What is AI inference routing?" }
  ],
});

console.log(response.choices[0].message.content);

curl https://api.inferoute.ai/v1/chat/completions \
  -H "Authorization: Bearer $INFEROUTE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      { "role": "user", "content": "What is AI inference routing?" }
    ]
  }'

A successful response looks like this:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1748131200,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "AI inference routing is the practice of directing model requests to different LLM providers based on criteria like cost, latency, and availability."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 32,
    "total_tokens": 46
  }
}

You can use any OpenAI-compatible model name in the model field — including Anthropic, Google, and Mistral models. Inferoute maps the model name to the appropriate provider automatically. See the models reference for the full list of supported model identifiers.

Every request you make appears in the Inferoute dashboard under Usage. From there you can monitor token consumption, estimated costs, and which providers handled each request.

Get Started

Core Concepts

Guides

Configuration

Support

Get Started with Inferoute