Introduction

Cinclus LLM On-Demand is a cloud-based service providing on-demand access to Large Language Model (LLM) capabilities via a simple API. It allows developers to integrate advanced natural language processing into their applications without managing any infrastructure. With Cinclus, you can generate human-like text, hold conversations with AI, and create semantic text embeddings for search and analysis.

Key Features

  • Multiple LLM Models: Choose from a range of models (various sizes and capabilities) to fit your needs, from fast smaller models to powerful large ones.
  • Text and Chat Completions: Generate text completions for prompts or engage in multi-turn conversations with a chat-centric API.
  • Embeddings: Create high-dimensional vector embeddings from text for semantic search, clustering, or recommendation systems.
  • Regional Zones: Optionally select the data center zone (e.g., NO, SW, DK, FI) to process your requests for compliance or latency requirements.
  • Scalable On-Demand: Automatically leverage shared capacity or dedicated instances to handle your workloads, with seamless scaling and failover across zones if needed.
  • Secure API: Authentication via API keys, with encryption in transit (HTTPS) and robust monitoring and rate limiting.

Base URL and Versioning

The Cinclus LLM API is accessed through a base URL. All endpoints documented here are relative to the base URL:

Base URL: https://api.cinclus.wayscloud.services/v1/

This base URL includes the API version (v1). All requests must be made over HTTPS. Future updates may introduce new versions (v2, etc.) to ensure backward compatibility.

Quick Start

  1. Obtain an API Key: Sign up for a Cinclus account and retrieve your API key from the dashboard. See Authentication for details.
  2. Choose a Model: Decide which model suits your task. For example, use a general-purpose model for chat or a specialized model for embeddings.
  3. Make a Request: Use the appropriate endpoint for your task (detailed in subsequent sections):
    • For generating text given a prompt, use the Text Completions endpoint.
    • For conversational AI, use the Chat Completions endpoint.
    • For obtaining text embeddings, use the Embeddings endpoint.
    • Include your API key in the request header for authentication (see Authentication).
  4. Handle the Response: Parse the JSON response from the API which will contain the model's output (the completion or embedding) and usage details.
  5. Monitor Usage: Track your usage and ensure you stay within rate limits. See Rate Limits & Usage for guidance.

Proceed to the next sections for detailed information on authentication, endpoint usage, examples, and advanced features like scaling and multi-zone offloading.