For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Contact usJoin the Discord
ManualAPI GuideAPI Reference
  • Overview
    • API Reference
    • Client libraries
  • API Reference
      • POSTMessages
      • POSTCount tokens in a Message
      • POSTValidate Messages Request
LogoLogo
Contact usJoin the Discord
API ReferenceMessages

Messages

POST
https://host.com/v1/messages
POST
/v1/messages
$curl -X POST https://host.com/v1/messages \
> -H "Content-Type: application/json" \
> -d '{
> "model": "model",
> "messages": [
> {
> "role": "assistant",
> "content": "content"
> }
> ],
> "max_tokens": 1
>}'
1{
2 "id": "msg_12345",
3 "type": "message",
4 "role": "assistant",
5 "model": "private-gpt",
6 "container": {
7 "id": "id",
8 "expires_at": "2024-01-15T09:30:00Z"
9 },
10 "stop_details": {
11 "type": "refusal",
12 "category": "cyber",
13 "explanation": "explanation"
14 },
15 "stop_reason": "end_turn",
16 "stop_sequence": "stop_sequence",
17 "usage": {
18 "cache_creation": {
19 "ephemeral_1h_input_tokens": 1,
20 "ephemeral_5m_input_tokens": 1
21 },
22 "cache_creation_input_tokens": 1,
23 "cache_read_input_tokens": 1,
24 "inference_geo": "inference_geo",
25 "input_tokens": 432,
26 "output_tokens": 89,
27 "server_tool_use": {
28 "web_fetch_requests": 1,
29 "web_search_requests": 1
30 },
31 "service_tier": "standard"
32 }
33}
Generate a chat completion from a conversation history. This endpoint enables multi-turn conversations with the AI model, with optional tool support and comprehensive message validation. Key Features: - Multi-turn conversations: Support for system, user, and assistant messages - Tool Support: Full tool use/result validation with automatic or manual selection - Citations: Enable `system.citations.enabled` to include references in responses - Streaming: Enable `stream` for partial updates in real-time - Default Prompts: Enable `system.use_default_prompt` for using Zylon prompts - Thinking: Enable `thinking.enabled` for step-by-step reasoning capabilities - Sampling Parameters: Control randomness with temperature, top_p, top_k, etc. Notes: - Tool use/result blocks must be properly paired within assistant messages - Tool choice type must be 'auto', 'tool', or 'none' - When tool_choice.type is 'tool', tool_choice.name must specify a valid tool - All message content is validated for completeness and proper structure - Last message must be from user or assistant for proper conversation flow - MCP servers provide external tool capabilities via Model Context Protocol - Sampling parameters control response randomness and token selection
Was this page helpful?
Previous

Count tokens in a Message

Next
Built with

Generate a chat completion from a conversation history.

This endpoint enables multi-turn conversations with the AI model, with optional tool support and comprehensive message validation.

Key Features:

  • Multi-turn conversations: Support for system, user, and assistant messages
  • Tool Support: Full tool use/result validation with automatic or manual selection
  • Citations: Enable system.citations.enabled to include references in responses
  • Streaming: Enable stream for partial updates in real-time
  • Default Prompts: Enable system.use_default_prompt for using Zylon prompts
  • Thinking: Enable thinking.enabled for step-by-step reasoning capabilities
  • Sampling Parameters: Control randomness with temperature, top_p, top_k, etc.

Notes:

  • Tool use/result blocks must be properly paired within assistant messages
  • Tool choice type must be ‘auto’, ‘tool’, or ‘none’
  • When tool_choice.type is ‘tool’, tool_choice.name must specify a valid tool
  • All message content is validated for completeness and proper structure
  • Last message must be from user or assistant for proper conversation flow
  • MCP servers provide external tool capabilities via Model Context Protocol
  • Sampling parameters control response randomness and token selection

Request

This endpoint expects an object.
modelstringRequiredDefaults to default
Model identifier or alias.
messageslist of objectsRequired
Conversation messages for the request.
max_tokensintegerRequired>=1
Maximum number of tokens to generate in the response.
systemlist of objectsOptional

System prompt input. Accepts str, list[str], System, list[System], or null. It is normalized internally to list[System].

toolslist of objectsOptional
Optional tool definitions.
thinkingobjectOptional
Thinking configuration.
tool_choiceobjectOptional
Tool selection policy.
output_configobjectOptional
Optional output configuration options.
cache_controlobjectOptional

Optional request-level cache control.

streambooleanOptional
Whether to stream the response back to the client.
tool_contextlist of objectsOptional
Context to provide to the tools, such as documents, databases connection strings, or data relevant to tool usage.
mcp_serverslist of objectsOptional
List of MCP servers to use for tool retrieval. Each server can have its own configuration.
containerstringOptional
Container identifier for reuse across requests.
response_formatobjectOptional

Deprecated response format. Use output_config.format instead.

priorityintegerOptional
Priority of the request, used for prioritizing responses.
seedintegerOptional
Random seed for reproducibility.
min_pdoubleOptional
Minimum probability threshold for token selection. Tokens with probability below this value are filtered out.
top_pdoubleOptional0-1
Nucleus sampling parameter. Only tokens with cumulative probability up to this value are considered.
temperaturedoubleOptional0-1
Controls randomness in generation. Higher values make output more random, lower values more deterministic.
top_kintegerOptional>=0
Limits token selection to the top K most likely tokens at each step.
repetition_penaltydoubleOptional
Penalty applied to tokens that have already appeared in the sequence to reduce repetition.
presence_penaltydoubleOptional
Penalty applied based on whether a token has appeared in the text, encouraging topic diversity.
frequency_penaltydoubleOptional
Penalty applied based on how frequently a token appears in the text, reducing repetitive content.
stop_sequenceslist of stringsOptional
Custom stop sequences that stop generation when matched.
metadataobjectOptional

Request metadata (for example, user_id).

service_tierenumOptional

Service tier preference (for example, “auto” or “standard_only”).

Allowed values:
inference_geostringOptional
Geographic region hint for inference processing.
correlation_idstringOptional
Correlation ID for tracking the request across systems.
maximum_loaded_skillsintegerOptional>=1
Optional cap for concurrently loaded skills in a conversation. When exceeded, the oldest loaded skill is evicted.
context_managementanyOptional

Response

This endpoint returns an object.
idstring
Message identifier.
type"message"
Object type.
role"assistant"
Message author role.
contentlist of objects
Assistant content blocks.
modelstringDefaults to private-gpt
Model name used.
containerobject
Optional execution container.
stop_detailsobject
Optional structured stop details.
stop_reasonstring
Message stop reason.
stop_sequencestring
Matched stop sequence, if any.
usageobject
Token usage stats.

Errors

401
Unauthorized
422
Unprocessable Entity