The Embeddings API converts input text into a numeric vector representation (embedding). Embeddings are useful for measuring semantic similarity between texts, performing nearest-neighbor searches in vector databases, clustering, and more. Cinclus's embedding models transform text into a high-dimensional vector space where similar texts are represented by vectors that are close together.
Endpoint
POST /v1/embeddings
Use this endpoint to generate embeddings for given text input(s).
Request Body Parameters
-
model
(string, required): The ID or name of the embedding model to use. Embedding models are typically different from completion models, optimized for vector representations (for example, "cinclus-embed-001"). -
input
(string or array of strings, required): The text to embed. You can provide a single string or an array of strings for batch embedding.- If an array is provided, each string in the array will be embedded separately, and the response will contain an embedding for each input.
-
zone
(string, optional): The region code if you want the embedding generation to occur in a specific zone (e.g., "NO", "SW", "DK", "FI"). If not provided, the system will choose an available zone automatically.
Example Request
Single text embedding (with cURL):
curl -X POST https://api.cinclus.wayscloud.services/v1/embeddings \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "cinclus-embed-001",
"input": "Artificial intelligence and machine learning"
}'
Batch embedding for multiple texts:
curl -X POST https://api.cinclus.wayscloud.services/v1/embeddings \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "cinclus-embed-001",
"input": ["Machine learning is fascinating", "I love learning about AI"]
}'
Example Response
For a single input:
{
"object": "list",
"data": [
{
"object": "embedding",
"embedding": [0.0216, -0.0435, 0.0078, ...],
"index": 0
}
],
"model": "cinclus-embed-001",
"usage": {
"prompt_tokens": 5,
"total_tokens": 5
}
}
For multiple inputs, the data
array will contain multiple objects, each with an index
corresponding to the position of the input in the request.
In the response:
-
embedding
: is an array of floating-point numbers. This is the vector representation of the input text in the model's embedding space. The length of this array (the dimensionality of the embedding) depends on the model (e.g., 512, 768, 1024 dimensions, etc.). -
index
: If you sent multiple inputs, this indicates which input this embedding corresponds to (using 0-based indexing). -
usage
: Indicates how many tokens were processed. Embedding usage counts tokens in the input text (for embeddings,prompt_tokens == total_tokens
as there is no "completion").
Usage and Applications
- Semantic Search: Convert your documents and queries to embeddings and use vector similarity to find relevant matches (e.g., find documents most similar to a query).
- Clustering & Classification: Embed sentences or documents and apply clustering algorithms or classification models in the embedding space to group or categorize content by similarity.
- Recommendation Systems: Use embeddings of user preferences or item descriptions to recommend similar items (for example, find products with similar description embeddings).
- Cross-language: If the embedding model supports multiple languages, embeddings can be used to compare text semantics across languages (e.g., an English query vs. French documents).
Best Practices
- The input text for embedding should be reasonably sized. Extremely long texts may need to be summarized or split into smaller chunks (for example, embed each paragraph separately) due to token limits.
- Use the same model for generating embeddings on both sides of a comparison. For example, if you're comparing a question to a set of answers, embed both the question and answers with the same model.
- Remember that different models produce embeddings in different vector spaces. An embedding from one model is not directly comparable to an embedding from a different model.
- If using zone selection, consider that embedding generation is typically fast, but choosing a zone closer to your data or users may reduce latency or comply with data residency requirements.