Inferoute REST API Reference

Inferoute exposes an OpenAI-compatible REST API that lets you send inference requests to multiple LLM providers through a single unified endpoint. Every request goes to https://api.inferoute.ai/v1, accepts JSON, and returns JSON — so any tooling that works with the OpenAI API works with Inferoute as well.

Base URL

https://api.inferoute.ai/v1

Authentication

All requests require a Bearer token in the Authorization header:

Authorization: Bearer YOUR_API_KEY

See Authentication for full details and code examples.

Available endpoints

Method	Endpoint	Description
`POST`	`/v1/chat/completions`	Generate chat responses from a messages array
`POST`	`/v1/completions`	Generate text completions from a prompt string
`POST`	`/v1/embeddings`	Generate vector embeddings for text input
`GET`	`/v1/models`	List all available models and their capabilities

Chat completions

Send a messages array and receive an assistant reply. Supports streaming, function calling, and all OpenAI-compatible parameters.

Completions

Legacy text completion endpoint. Accepts a prompt string and returns generated text.

Embeddings

Generate vector representations of text for semantic search, RAG pipelines, and classification.

Models

Retrieve the full list of models available on Inferoute, including provider, context window, and capabilities.

OpenAI SDK compatibility

Because Inferoute implements the OpenAI API spec, you can use any OpenAI SDK by pointing base_url at Inferoute:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_INFEROUTE_API_KEY",
    base_url="https://api.inferoute.ai/v1",
)

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_INFEROUTE_API_KEY",
  baseURL: "https://api.inferoute.ai/v1",
});

const response = await client.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);

Inferoute-specific headers

Inferoute extends the OpenAI request/response contract with a small set of headers for routing control and observability.

Request headers

Header	Values	Description
`X-Inferoute-Strategy`	`cost`, `latency`, `availability`, `round-robin`	Routing strategy for this request
`X-Inferoute-Fallback`	Comma-separated model IDs	Ordered list of fallback models if the primary is unavailable

Response headers

Header	Description
`X-Inferoute-Provider`	The provider that served the request (e.g., `openai`, `anthropic`)
`X-Inferoute-Request-Id`	Unique request identifier for debugging and support
`X-Inferoute-Latency-Ms`	End-to-end request latency in milliseconds

Overview

Endpoints

Inferoute REST API Reference

Base URL

Authentication

Available endpoints

Chat completions

Completions

Embeddings

Models

OpenAI SDK compatibility

Inferoute-specific headers

Request headers

Response headers

​Base URL

​Authentication

​Available endpoints

Chat completions

Completions

Embeddings

Models

​OpenAI SDK compatibility

​Inferoute-specific headers

​Request headers

​Response headers

Base URL

Authentication

Available endpoints

OpenAI SDK compatibility

Inferoute-specific headers

Request headers

Response headers