POST /v1/completions

The completions endpoint is a legacy text generation interface that accepts a plain prompt string and returns generated text. It is fully compatible with the OpenAI Completions API and is available for workloads that depend on the older prompt-in/text-out contract.

For most new applications, prefer the Chat Completions endpoint. It supports more capable models, structured conversations, and function calling. The completions endpoint exists primarily for backward compatibility.

Endpoint

POST https://api.inferoute.ai/v1/completions

Request parameters

string

required

The model to use. Use the provider-prefixed format (openai/gpt-3.5-turbo-instruct) or the short name where unambiguous. Retrieve available model IDs from GET /v1/models.

string | string[]

required

The prompt text to complete. Pass a string for a single prompt or an array of strings to generate completions for multiple prompts in one request.

integer

default:"16"

Maximum number of tokens to generate per completion.

number

default:"1"

Sampling temperature between 0 and 2. Lower values are more deterministic; higher values are more creative.

boolean

default:"false"

Stream the response as server-sent events. Each event contains a partial completion delta. The stream ends with data: [DONE].

string | string[]

One or more stop sequences. Generation stops when any sequence is encountered; the stop sequence itself is not included in the output.

integer

default:"1"

Generate this many completions server-side and return the best one (as measured by log probability). Higher values increase latency and token usage.

integer

Include log probabilities for the top logprobs tokens at each position. Maximum value is 5.

Response fields

string

Unique identifier for this completion request.

string

Always "text_completion".

string

The model that served the request.

object[]

Show properties

integer

Zero-based index of this choice.

string

The generated text.

string

Why generation stopped: "stop", "length", or "content_filter".

object

Show properties

integer

Number of tokens in the prompt.

integer

Number of tokens generated.

integer

Sum of prompt and completion tokens.

Example

curl https://api.inferoute.ai/v1/completions \
  --request POST \
  --header "Authorization: Bearer YOUR_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "openai/gpt-3.5-turbo-instruct",
    "prompt": "The tallest mountain in the world is",
    "max_tokens": 64,
    "temperature": 0.5
  }'

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.inferoute.ai/v1",
)

response = client.completions.create(
    model="openai/gpt-3.5-turbo-instruct",
    prompt="The tallest mountain in the world is",
    max_tokens=64,
    temperature=0.5,
)
print(response.choices[0].text)

Response:

{
  "id": "cmpl-xyz789",
  "object": "text_completion",
  "model": "openai/gpt-3.5-turbo-instruct",
  "choices": [
    {
      "index": 0,
      "text": " Mount Everest, standing at 8,848 metres (29,029 ft) above sea level.",
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 18,
    "total_tokens": 28
  }
}

Overview

Endpoints

POST /v1/completions

Endpoint

Request parameters

Response fields

Example

​Endpoint

​Request parameters

​Response fields

​Example

Endpoint

Request parameters

Response fields

Example