Messages

Generate a chat completion from a conversation history. This endpoint enables multi-turn conversations with the AI model, with optional tool support and comprehensive message validation. Key Features: - Multi-turn conversations: Support for system, user, and assistant messages - Tool Support: Full tool use/result validation with automatic or manual selection - Citations: Enable `system.citations.enabled` to include references in responses - Streaming: Enable `stream` for partial updates in real-time - Default Prompts: Enable `system.use_default_prompt` for using Zylon prompts - Thinking: Enable `thinking.enabled` for step-by-step reasoning capabilities - Sampling Parameters: Control randomness with temperature, top_p, top_k, etc. Notes: - Tool use/result blocks must be properly paired within assistant messages - Tool choice type must be 'auto', 'tool', or 'none' - When tool_choice.type is 'tool', tool_choice.name must specify a valid tool - All message content is validated for completeness and proper structure - Last message must be from user or assistant for proper conversation flow - MCP servers provide external tool capabilities via Model Context Protocol - Sampling parameters control response randomness and token selection

Request

This endpoint expects an object.
modelstringRequiredDefaults to default
Model identifier or alias.
messageslist of objectsRequired
Conversation messages for the request.
max_tokensintegerRequired>=1
Maximum number of tokens to generate in the response.
systemlist of objectsOptional

System prompt input. Accepts str, list[str], System, list[System], or null. It is normalized internally to list[System].

toolslist of objectsOptional
Optional tool definitions.
thinkingobjectOptional
Thinking configuration.
tool_choiceobjectOptional
Tool selection policy.
output_configobjectOptional
Optional output configuration options.
cache_controlobjectOptional

Optional request-level cache control.

streambooleanOptional
Whether to stream the response back to the client.
tool_contextlist of objectsOptional
Context to provide to the tools, such as documents, databases connection strings, or data relevant to tool usage.
mcp_serverslist of objectsOptional
List of MCP servers to use for tool retrieval. Each server can have its own configuration.
containerstringOptional
Container identifier for reuse across requests.
response_formatobjectOptional

Deprecated response format. Use output_config.format instead.

priorityintegerOptional
Priority of the request, used for prioritizing responses.
seedintegerOptional
Random seed for reproducibility.
min_pdoubleOptional
Minimum probability threshold for token selection. Tokens with probability below this value are filtered out.
top_pdoubleOptional0-1
Nucleus sampling parameter. Only tokens with cumulative probability up to this value are considered.
temperaturedoubleOptional0-1
Controls randomness in generation. Higher values make output more random, lower values more deterministic.
top_kintegerOptional>=0
Limits token selection to the top K most likely tokens at each step.
repetition_penaltydoubleOptional
Penalty applied to tokens that have already appeared in the sequence to reduce repetition.
presence_penaltydoubleOptional
Penalty applied based on whether a token has appeared in the text, encouraging topic diversity.
frequency_penaltydoubleOptional
Penalty applied based on how frequently a token appears in the text, reducing repetitive content.
stop_sequenceslist of stringsOptional
Custom stop sequences that stop generation when matched.
metadataobjectOptional

Request metadata (for example, user_id).

service_tierenumOptional

Service tier preference (for example, “auto” or “standard_only”).

inference_geostringOptional
Geographic region hint for inference processing.
correlation_idstringOptional
Correlation ID for tracking the request across systems.
maximum_loaded_skillsintegerOptional>=1
Optional cap for concurrently loaded skills in a conversation. When exceeded, the oldest loaded skill is evicted.
context_managementanyOptional

Response

This endpoint returns an object.
idstring
Message identifier.
type"message"
Object type.
role"assistant"
Message author role.
contentlist of objects
Assistant content blocks.
modelstringDefaults to private-gpt
Model name used.
containerobject
Optional execution container.
stop_detailsobject
Optional structured stop details.
stop_reasonstring
Message stop reason.
stop_sequencestring
Matched stop sequence, if any.
usageobject
Token usage stats.

Errors

401
Unauthorized
422
Unprocessable Entity