Chat Completions API

The Chat Completions API allows you to have multi-turn conversations with the LLM, following a chat-based interaction pattern. This is useful for building chatbots, assistants, or any application that requires the AI to maintain context across multiple turns in a conversation.

Endpoint

POST /v1/chat/completions

Use this endpoint to request a chat-based completion (an assistant response) given a sequence of messages representing the conversation.

Request Body Parameters

  • model (string, required): The ID or name of the chat-optimized model to use (for example, "cinclus-chat-001" or another conversational model).
  • messages (array of objects, required): A list of message objects that represent the conversation so far. Each message has:
    • role (string): Role of the message sender. Valid values are "system", "user", or "assistant".
    • content (string): The content of the message.

The conversation typically begins with a system message to set context or instructions, followed by alternating user and assistant messages. For example:

"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, who are you?"},
{"role": "assistant", "content": "I am an AI assistant. How can I help you today?"},
{"role": "user", "content": "Can you tell me a joke?"}
]

In this case, the next response (to be generated) will be based on this dialogue history, with the last message being a user request for a joke.

  • max_tokens (integer, optional): The maximum number of tokens to generate for the assistant's reply.
  • temperature (float, optional): Controls randomness of the response (same meaning as in Text Completions API).
  • top_p (float, optional): Nucleus sampling parameter (same meaning as in Text Completions API).
  • n (integer, optional): How many different responses to generate. Defaults to 1. If more than 1, the response will include multiple choices (multiple reply options).
  • stop (string or array of strings, optional): Stop sequences for the chat completion. If the model generates any of these sequences, it will stop at that point.
  • zone (string, optional): Region code to handle the request in a specific zone (e.g., "NO", "SW", "DK", "FI"). If omitted, the service auto-selects a zone.

Note: The messages array should include the latest user prompt as the last item. The model will then produce an assistant message as the next response. Always include your API key in the Authorization header as well.

Example Request

Here's a cURL example for a chat conversation:

curl -X POST https://api.cinclus.wayscloud.services/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "cinclus-chat-001",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, who are you?"}
],
"temperature": 0.7,
"max_tokens": 100
}'

In this example, we provide a system message setting the context, and a user message asking a question.

Example Request (JSON payload)

{
"model": "cinclus-chat-001",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, who are you?"}
],
"temperature": 0.7,
"max_tokens": 100
}

Example Response

{
"id": "chatcmpl-7a8B9CdEfGHijKLMnOpqrST",
"object": "chat.completion",
"created": 1630000005,
"model": "cinclus-chat-001",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm an AI assistant here to help you. How can I assist you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 20,
"total_tokens": 32
}
}

In this response:

  • message.content contains the assistant's reply, which is generated based on the conversation history.
  • finish_reason indicates why the model stopped. "stop" typically means the model completed the response as desired.
  • usage shows the token count, including all the tokens from the messages sent (prompt_tokens) and the tokens in the assistant's reply (completion_tokens). This total helps you monitor your usage.

Maintaining Conversation State

Cinclus does not store conversation state between requests. To continue a conversation, your application should append the latest assistant response and the new user prompt to the messages list for the next request. For example:

  1. Send an initial request with a system message and a user message. Receive the assistant's answer.
  2. Append the assistant's answer to the messages array (with role "assistant").
  3. When the user asks another question or replies, append that as a new "user" message at the end of the array.
  4. Send the updated messages array in your next request.

By continually sending the conversation history, the model will be able to maintain context across turns.

Tips for Chat Completions

  • System Messages: Use the initial system message to guide the assistant's behavior (e.g., define the assistant's role, tone, or special instructions). This can help you get more relevant or formatted responses.
  • Limiting History: Be mindful of the token limits of the model (which include the entire conversation). If the conversation becomes long, you may need to remove or summarize older messages to stay within the context window.
  • Multiple Responses: If you request n > 1 responses, the choices array will contain multiple different assistant replies. This can be useful to present alternatives or for testing different responses.
  • Streaming: (If supported) You might be able to receive the response as a stream of data (token by token) for lower latency and real-time applications. Check if the API supports a streaming mode for chat completions.

For single-turn text generation without maintaining a conversation, see the Text Completions API. For generating embeddings from text, see the Embeddings API.