Choose the Right Model for Your Use Case

Inferoute gives you access to dozens of models across OpenAI, Anthropic, Google, Mistral, and other providers through a single API. You can specify a model by its full provider-prefixed name, use a Inferoute alias that abstracts the provider away, or let Inferoute choose automatically based on your routing strategy.

How model selection works

There are three ways to specify a model in your request:

Provider-prefixed name — explicitly targets a specific model from a specific provider: openai/gpt-4o, anthropic/claude-3-5-sonnet, google/gemini-1.5-pro
Inferoute alias — a stable identifier like premium, balanced, or economy that maps to the best available model in that tier at request time
Auto-routing — omit the model field entirely and set X-Inferoute-Strategy: cost or X-Inferoute-Strategy: latency to let Inferoute decide

Model categories

Premium

Highest quality for complex reasoning, long-form writing, and multi-step tasks. Best when output quality directly affects user experience.

openai/gpt-4o
anthropic/claude-3-5-sonnet
Alias: premium

Balanced

Strong capability at moderate cost. A good default for most production workloads where you want solid quality without paying for top-tier.

openai/gpt-4o-mini
anthropic/claude-3-haiku
Alias: balanced

Economy

Fastest and cheapest. Ideal for classification, extraction, summarization, and any task where throughput matters more than nuance.

openai/gpt-4o-mini
anthropic/claude-3-haiku
google/gemini-flash
Alias: economy

Long context

For documents, codebases, and conversation histories that exceed standard context windows.

anthropic/claude-3-5-sonnet (200k tokens)
google/gemini-1.5-pro (1M tokens)
Alias: long-context

Recommended models by use case

Use case	Recommended model	Why
Complex reasoning, analysis	`openai/gpt-4o`, `anthropic/claude-3-5-sonnet`	Highest instruction-following and reasoning accuracy
Chatbots, customer support	`openai/gpt-4o-mini`, `anthropic/claude-3-haiku`	Low latency, cost-effective at high volume
Document summarization	`anthropic/claude-3-5-sonnet`, `google/gemini-1.5-pro`	Long context windows handle full documents
Classification / labeling	`openai/gpt-4o-mini`, `google/gemini-flash`	Simple tasks don’t need expensive models
Embeddings (search, RAG)	`openai/text-embedding-3-small`, `openai/text-embedding-3-large`	Purpose-built for vector representations
Code generation	`openai/gpt-4o`, `anthropic/claude-3-5-sonnet`	Strong code comprehension and generation

Model tier comparison

Premium
Balanced
Economy

Premium models deliver the best output quality across reasoning, writing, and instruction-following. Use them when accuracy and response quality are critical.Models: openai/gpt-4o, anthropic/claude-3-5-sonnetAlias: premium

python

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {"role": "user", "content": "Analyze the trade-offs of microservices vs monolithic architecture for a startup with 5 engineers."},
    ],
)

node.js

const response = await client.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [
    {
      role: "user",
      content:
        "Analyze the trade-offs of microservices vs monolithic architecture for a startup with 5 engineers.",
    },
  ],
});

Typical cost: ~

5–

15 per 1M input tokens

Balanced models cover most production workloads well. They handle nuanced tasks at a fraction of premium model cost.Models: openai/gpt-4o-mini, anthropic/claude-3-haikuAlias: balanced

python

response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[
        {"role": "user", "content": "Draft a professional out-of-office email reply."},
    ],
)

node.js

const response = await client.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [
    {
      role: "user",
      content: "Draft a professional out-of-office email reply.",
    },
  ],
});

Typical cost: ~

0.15–

0.40 per 1M input tokens

Economy models are optimized for throughput and cost. Use them for tasks with clear, bounded outputs where you process high volumes.Models: openai/gpt-4o-mini, anthropic/claude-3-haiku, google/gemini-flashAlias: economy

python

response = client.chat.completions.create(
    model="economy",  # Inferoute alias — routes to cheapest available
    messages=[
        {"role": "user", "content": "Classify this support ticket as: billing, technical, or general."},
    ],
)

node.js

const response = await client.chat.completions.create({
  model: "economy", // Inferoute alias — routes to cheapest available
  messages: [
    {
      role: "user",
      content:
        "Classify this support ticket as: billing, technical, or general.",
    },
  ],
});

Typical cost: ~

0.07–

0.25 per 1M input tokens

Provider-agnostic code with aliases

When you hardcode a provider-prefixed model name like openai/gpt-4o, swapping providers requires a code change. Inferoute aliases let you decouple your application from any specific provider.

python

# Hardcoded — requires code change to switch providers
model = "openai/gpt-4o"

# Alias — Inferoute resolves to the best model in this tier
# You can update the alias mapping in the dashboard without touching code
model = "premium"

You can configure what each alias resolves to in the Model Aliases section of your dashboard. This lets you A/B test new models, react to provider outages, or optimize costs — all without deploying code changes.

Embeddings

For embeddings, specify the model directly. Inferoute passes the request to the provider without modification.

python

response = client.embeddings.create(
    model="openai/text-embedding-3-small",
    input="Inferoute routes your LLM requests intelligently.",
)

embedding = response.data[0].embedding
print(f"Embedding dimensions: {len(embedding)}")  # 1536 for text-embedding-3-small

Model	Dimensions	Best for
`openai/text-embedding-3-small`	1536	High-volume search, RAG pipelines where cost matters
`openai/text-embedding-3-large`	3072	Highest accuracy retrieval, semantic similarity

Get Started

Core Concepts

Guides

Configuration

Support

Choose the Right Model for Your Use Case

How model selection works

Model categories

Premium

Balanced

Economy

Long context

Recommended models by use case

Model tier comparison

Provider-agnostic code with aliases

Embeddings

​How model selection works

​Model categories

Premium

Balanced

Economy

Long context

​Recommended models by use case

​Model tier comparison

​Provider-agnostic code with aliases

​Embeddings

How model selection works

Model categories

Recommended models by use case

Model tier comparison

Provider-agnostic code with aliases

Embeddings