Messages | PrivateGPT

Generate a chat completion from a conversation history.

This endpoint enables multi-turn conversations with the AI model, with optional tool support and comprehensive message validation.

Key Features:

Multi-turn conversations: Support for system, user, and assistant messages
Tool Support: Full tool use/result validation with automatic or manual selection
Citations: Enable system.citations.enabled to include references in responses
Streaming: Enable stream for partial updates in real-time
Default Prompts: Enable system.use_default_prompt for using Zylon prompts
Thinking: Enable thinking.enabled for step-by-step reasoning capabilities
Sampling Parameters: Control randomness with temperature, top_p, top_k, etc.

Notes:

Tool use/result blocks must be properly paired within assistant messages
Tool choice type must be ‘auto’, ‘tool’, or ‘none’
When tool_choice.type is ‘tool’, tool_choice.name must specify a valid tool
All message content is validated for completeness and proper structure
Last message must be from user or assistant for proper conversation flow
MCP servers provide external tool capabilities via Model Context Protocol
Sampling parameters control response randomness and token selection

Generate a chat completion from a conversation history. This endpoint enables multi-turn conversations with the AI model, with optional tool support and comprehensive message validation. Key Features: - Multi-turn conversations: Support for system, user, and assistant messages - Tool Support: Full tool use/result validation with automatic or manual selection - Citations: Enable `system.citations.enabled` to include references in responses - Streaming: Enable `stream` for partial updates in real-time - Default Prompts: Enable `system.use_default_prompt` for using Zylon prompts - Thinking: Enable `thinking.enabled` for step-by-step reasoning capabilities - Sampling Parameters: Control randomness with temperature, top_p, top_k, etc. Notes: - Tool use/result blocks must be properly paired within assistant messages - Tool choice type must be 'auto', 'tool', or 'none' - When tool_choice.type is 'tool', tool_choice.name must specify a valid tool - All message content is validated for completeness and proper structure - Last message must be from user or assistant for proper conversation flow - MCP servers provide external tool capabilities via Model Context Protocol - Sampling parameters control response randomness and token selection

Request

This endpoint expects an object.

modelstringRequiredDefaults to default

Model identifier or alias.

messageslist of objectsRequired

Conversation messages for the request.

max_tokensintegerRequired>=1

Maximum number of tokens to generate in the response.

systemlist of objectsOptional

System prompt input. Accepts str, list[str], System, list[System], or null. It is normalized internally to list[System].

toolslist of objectsOptional

Optional tool definitions.

thinkingobjectOptional

Thinking configuration.

tool_choiceobjectOptional

Tool selection policy.

output_configobjectOptional

Optional output configuration options.

cache_controlobjectOptional

Optional request-level cache control.

streambooleanOptional

Whether to stream the response back to the client.

tool_contextlist of objectsOptional

Context to provide to the tools, such as documents, databases connection strings, or data relevant to tool usage.

mcp_serverslist of objectsOptional

List of MCP servers to use for tool retrieval. Each server can have its own configuration.

containerstringOptional

Container identifier for reuse across requests.

response_formatobjectOptional

Deprecated response format. Use output_config.format instead.

priorityintegerOptional

Priority of the request, used for prioritizing responses.

seedintegerOptional

Random seed for reproducibility.

min_pdoubleOptional

Minimum probability threshold for token selection. Tokens with probability below this value are filtered out.

top_pdoubleOptional0-1

Nucleus sampling parameter. Only tokens with cumulative probability up to this value are considered.

temperaturedoubleOptional0-1

Controls randomness in generation. Higher values make output more random, lower values more deterministic.

top_kintegerOptional>=0

Limits token selection to the top K most likely tokens at each step.

repetition_penaltydoubleOptional

Penalty applied to tokens that have already appeared in the sequence to reduce repetition.

presence_penaltydoubleOptional

Penalty applied based on whether a token has appeared in the text, encouraging topic diversity.

frequency_penaltydoubleOptional

Penalty applied based on how frequently a token appears in the text, reducing repetitive content.

stop_sequenceslist of stringsOptional

Custom stop sequences that stop generation when matched.

metadataobjectOptional

Request metadata (for example, user_id).

service_tierenumOptional

Service tier preference (for example, “auto” or “standard_only”).

inference_geostringOptional

Geographic region hint for inference processing.

correlation_idstringOptional

Correlation ID for tracking the request across systems.

maximum_loaded_skillsintegerOptional>=1

Optional cap for concurrently loaded skills in a conversation. When exceeded, the oldest loaded skill is evicted.

context_managementanyOptional

Response

This endpoint returns an object.

idstring

Message identifier.

type"message"

Object type.

role"assistant"

Message author role.

contentlist of objects

Assistant content blocks.

modelstringDefaults to private-gpt

Model name used.

containerobject

Optional execution container.

stop_detailsobject

Optional structured stop details.

stop_reasonstring

Message stop reason.

stop_sequencestring

Matched stop sequence, if any.

usageobject

Token usage stats.

Errors

401

Unauthorized

422

Unprocessable Entity

Generate a chat completion from a conversation history.

This endpoint enables multi-turn conversations with the AI model, with optional tool support and comprehensive message validation.

Key Features:

Multi-turn conversations: Support for system, user, and assistant messages
Tool Support: Full tool use/result validation with automatic or manual selection
Citations: Enable system.citations.enabled to include references in responses
Streaming: Enable stream for partial updates in real-time
Default Prompts: Enable system.use_default_prompt for using Zylon prompts
Thinking: Enable thinking.enabled for step-by-step reasoning capabilities
Sampling Parameters: Control randomness with temperature, top_p, top_k, etc.

Notes:

Tool use/result blocks must be properly paired within assistant messages
Tool choice type must be ‘auto’, ‘tool’, or ‘none’
When tool_choice.type is ‘tool’, tool_choice.name must specify a valid tool
All message content is validated for completeness and proper structure
Last message must be from user or assistant for proper conversation flow
MCP servers provide external tool capabilities via Model Context Protocol
Sampling parameters control response randomness and token selection

$	curl -X POST https://host.com/v1/messages \
>	-H "Content-Type: application/json" \
>	-d '{
>	"model": "model",
>	"messages": [
>	{
>	"role": "system",
>	"content": "content"
>	}
>	],
>	"max_tokens": 1
>	}'

1	{
2	"id": "msg_12345",
3	"type": "message",
4	"role": "assistant",
5	"model": "private-gpt",
6	"container": {
7	"id": "id",
8	"expires_at": "2024-01-15T09:30:00Z"
9	},
10	"stop_details": {
11	"type": "refusal",
12	"category": "cyber",
13	"explanation": "explanation"
14	},
15	"stop_reason": "end_turn",
16	"stop_sequence": "stop_sequence",
17	"usage": {
18	"cache_creation": {
19	"ephemeral_1h_input_tokens": 1,
20	"ephemeral_5m_input_tokens": 1
21	},
22	"cache_creation_input_tokens": 1,
23	"cache_read_input_tokens": 1,
24	"inference_geo": "inference_geo",
25	"input_tokens": 432,
26	"output_tokens": 89,
27	"output_tokens_details": {
28	"reasoning_tokens": 1
29	},
30	"server_tool_use": {
31	"web_fetch_requests": 1,
32	"web_search_requests": 1
33	},
34	"service_tier": "standard"
35	}
36	}