The completions endpoint is a legacy text generation interface that accepts a plain prompt string and returns generated text. It is fully compatible with the OpenAI Completions API and is available for workloads that depend on the older prompt-in/text-out contract.Documentation Index
Fetch the complete documentation index at: https://docs.inferoute.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
For most new applications, prefer the Chat Completions endpoint. It supports more capable models, structured conversations, and function calling. The completions endpoint exists primarily for backward compatibility.
Endpoint
Request parameters
The model to use. Use the provider-prefixed format (
openai/gpt-3.5-turbo-instruct) or the short name where unambiguous. Retrieve available model IDs from GET /v1/models.The prompt text to complete. Pass a string for a single prompt or an array of strings to generate completions for multiple prompts in one request.
Maximum number of tokens to generate per completion.
Sampling temperature between
0 and 2. Lower values are more deterministic; higher values are more creative.Stream the response as server-sent events. Each event contains a partial completion delta. The stream ends with
data: [DONE].One or more stop sequences. Generation stops when any sequence is encountered; the stop sequence itself is not included in the output.
Generate this many completions server-side and return the best one (as measured by log probability). Higher values increase latency and token usage.
Include log probabilities for the top
logprobs tokens at each position. Maximum value is 5.Response fields
Unique identifier for this completion request.
Always
"text_completion".The model that served the request.