Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.inferoute.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

The completions endpoint is a legacy text generation interface that accepts a plain prompt string and returns generated text. It is fully compatible with the OpenAI Completions API and is available for workloads that depend on the older prompt-in/text-out contract.
For most new applications, prefer the Chat Completions endpoint. It supports more capable models, structured conversations, and function calling. The completions endpoint exists primarily for backward compatibility.

Endpoint

POST https://api.tokenhub.ai/v1/completions

Request parameters

model
string
required
The model to use. Use the provider-prefixed format (openai/gpt-3.5-turbo-instruct) or the short name where unambiguous. Retrieve available model IDs from GET /v1/models.
prompt
string | string[]
required
The prompt text to complete. Pass a string for a single prompt or an array of strings to generate completions for multiple prompts in one request.
max_tokens
integer
default:"16"
Maximum number of tokens to generate per completion.
temperature
number
default:"1"
Sampling temperature between 0 and 2. Lower values are more deterministic; higher values are more creative.
stream
boolean
default:"false"
Stream the response as server-sent events. Each event contains a partial completion delta. The stream ends with data: [DONE].
stop
string | string[]
One or more stop sequences. Generation stops when any sequence is encountered; the stop sequence itself is not included in the output.
best_of
integer
default:"1"
Generate this many completions server-side and return the best one (as measured by log probability). Higher values increase latency and token usage.
logprobs
integer
Include log probabilities for the top logprobs tokens at each position. Maximum value is 5.

Response fields

id
string
Unique identifier for this completion request.
object
string
Always "text_completion".
model
string
The model that served the request.
choices
object[]
usage
object

Example

curl https://api.tokenhub.ai/v1/completions \
  --request POST \
  --header "Authorization: Bearer YOUR_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "openai/gpt-3.5-turbo-instruct",
    "prompt": "The tallest mountain in the world is",
    "max_tokens": 64,
    "temperature": 0.5
  }'
Response:
{
  "id": "cmpl-xyz789",
  "object": "text_completion",
  "model": "openai/gpt-3.5-turbo-instruct",
  "choices": [
    {
      "index": 0,
      "text": " Mount Everest, standing at 8,848 metres (29,029 ft) above sea level.",
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 18,
    "total_tokens": 28
  }
}