The Text Completions API generates a text completion given a prompt. This endpoint is ideal for tasks like content generation, summarization, or any single-turn Q&A where you provide a prompt and the model returns a completion. You provide a prompt or partial sentence, and the model will continue or respond with a completion.
Endpoint
POST /v1/completions
Use this endpoint to request a text completion. The request body should be JSON and include the prompt and any generation parameters.
Request Body Parameters
-
model
(string, required): The ID or name of the model to use for this completion. Example:"cinclus-text-001"
or"gpt-3.5-turbo"
(depending on the models available in Cinclus). -
prompt
(string, required): The input text prompt that you want the model to complete or respond to. This can be a question, a partial sentence, or any text. -
max_tokens
(integer, optional): The maximum number of tokens (words or word pieces) to generate in the completion. This limits how long the generated response can be. For instance,max_tokens: 100
would generate up to 100 tokens. -
temperature
(float, optional): Controls the randomness of the output. Range is typically 0.0 to 1.0. Lower values (e.g., 0.2) make output more focused and deterministic, while higher values (e.g., 0.8) make it more random and creative. -
top_p
(float, optional): An alternative to temperature for controlling output randomness. This uses nucleus sampling – the model considers only the most likely tokens with cumulative probability up totop_p
. For example,top_p: 0.9
means only tokens comprising the top 90% probability mass are considered. -
n
(integer, optional): How many completion choices to generate for each prompt. Defaults to 1. If you setn: 3
, the API will return three separate generated completions in a single response. -
stop
(string or array of strings, optional): One or more sequences where the model will stop generating further tokens. For example, settingstop: ["\n\n"]
might tell the model to stop when it generates a double newline. -
zone
(string, optional): The region code to process this request in a specific zone. Possible values include"NO"
(Norway),"SW"
(Sweden),"DK"
(Denmark),"FI"
(Finland). If omitted, the service will route the request to an available zone automatically.
Note: All parameters are passed within a JSON object in the request body. The Authorization
header with your API key must also be provided (see Authentication).
Example Request
Here is an example using cURL to ask the model a simple question:
curl -X POST https://api.cinclus.wayscloud.services/v1/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "cinclus-text-001",
"prompt": "Write a short greeting for a new user:",
"max_tokens": 50,
"temperature": 0.7,
"zone": "NO"
}'
Example Request (JSON payload)
The JSON payload for the above request would be:
{
"model": "cinclus-text-001",
"prompt": "Write a short greeting for a new user:",
"max_tokens": 50,
"temperature": 0.7,
"zone": "NO"
}
Example Response
A successful response will return a JSON object containing the generated completion and some metadata. For example:
{
"id": "cmpl-5dFy8AbCdEfGhIjKlMNOPqrs",
"object": "text_completion",
"created": 1630000000,
"model": "cinclus-text-001",
"choices": [
{
"text": "Hello and welcome! We're excited to have you here. If you have any questions, feel free to ask!",
"index": 0,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 20,
"total_tokens": 29
}
}
In this response:
-
choices
: An array of completion results (since we requested only one, it has one element). Thetext
field contains the generated completion. -
finish_reason
: Indicates why the model stopped generating. "stop" means it stopped naturally (e.g., it encountered a stop sequence or completed the text). Other values might be "length" if it stopped because it reached the max_tokens limit. -
usage
: Token usage for this request.prompt_tokens
is how many tokens were in your input prompt,completion_tokens
is how many tokens the model generated, andtotal_tokens
is the sum. This helps track billing or usage limits.
Usage and Best Practices
Always review the finish_reason
to determine if the completion was cut off ("length") or ended naturally ("stop"). If the completion seems incomplete (e.g., the model stopped mid-sentence due to reaching max_tokens), you can resubmit with a higher max_tokens value. Use the stop
parameter to prevent the model from trailing off into irrelevant text. For example, when generating a snippet of code or a sentence, you might use a stop sequence like "\n" or another token to control where it ends. The model
parameter allows you to switch between different language models.
Use a larger model for more complex tasks and a smaller one for faster, cost-effective results. The zone
parameter is optional. If data residency or latency is important, specify a zone. Otherwise, you can omit it and let Cinclus route your request optimally. If you need interactive, multi-turn conversations, use the Chat Completions API. For embedding generation, see the Embeddings API.