Messages | PrivateGPT

The Messages API (POST /v1/messages) is the primary endpoint for generating responses. It accepts a conversation history and returns a model reply.

Basic usage

$ curl http://localhost:8080/v1/messages \
>   -H "Content-Type: application/json" \
>   -d '{
>     "model": "qwen3.5:35b",
>     "messages": [
>       {"role": "user", "content": "Explain retrieval-augmented generation."}
>     ]
>   }'

Streaming

Set "stream": true to receive a Server-Sent Events stream instead of a single JSON response:

$ curl http://localhost:8080/v1/messages \
>   -H "Content-Type: application/json" \
>   -d '{"model": "qwen3.5:35b", "stream": true, "messages": [...]}'

Context — documents, databases, web

Pass tool_context to give the model access to ingested documents or databases.

Ingested documents (retrieval with citations):

1 {
2   "model": "qwen3.5:35b",
3   "messages": [{"role": "user", "content": "Summarise the contract."}],
4   "tool_context": [
5     {
6       "type": "ingested_artifact",
7       "context_filter": {
8         "collection": "my-collection",
9         "artifacts": ["artifact-id-1"]
10       }
11     }
12   ]
13 }

SQL database (natural language to SQL):

1 {
2   "tool_context": [
3     {
4       "type": "sql_database",
5       "connection_string": "postgresql://user:pass@localhost:5432/mydb",
6       "description": "Sales database"
7     }
8   ]
9 }

Tools

Pass built-in server tools or custom tools in the tools array.

Built-in tool dependencies are granular. Install the specific extra you need, or use private-gpt[tools] as the bundle fallback. private-gpt[core] also includes that bundle.

Built-in server tools — reference by type:

Built-in tools only require name and type. Do not provide inputSchema for built-in tools. Add context only for built-in tools that require it. See Tools for the full chat-first reference, including skills, code execution, client tools, and per-tool examples.

1 {
2   "tools": [
3     {"name": "search_docs", "type": "semantic_search_v1"},
4     {"name": "analyze_sales", "type": "tabular_analysis_v1"},
5     {"name": "query_db", "type": "database_query_v1"},
6     {"name": "search_web", "type": "web_search_v1"},
7     {"name": "fetch_url", "type": "web_fetch_v1"},
8     {"name": "skills", "type": "skills_v1"},
9     {"name": "code_execution", "type": "code_execution_v1"}
10   ]
11 }

Custom tools — define inputSchema (JSON Schema):

For the broadest tool-calling support, use private-gpt[tools] or private-gpt[core].

1 {
2   "tools": [
3     {
4       "name": "get_weather",
5       "description": "Get current weather for a city",
6       "inputSchema": {
7         "type": "object",
8         "properties": {
9           "city": {"type": "string"}
10         },
11         "required": ["city"]
12       }
13     }
14   ]
15 }

The model will return a tool_use block when it wants to call a tool. Your application runs the tool and sends the result back as a tool_result message.

MCP servers

Connect MCP servers to extend what tools the model can call:

Requires private-gpt[tool-mcp], or use private-gpt[tools] or private-gpt[core].

1 {
2   "mcp_servers": [
3     {
4       "name": "my-mcp",
5       "url": "https://my-mcp-server.example.com",
6       "authorization_token": "token"
7     }
8   ]
9 }

Sampling parameters

Parameter	Description
`temperature`	Randomness (0 = deterministic, higher = more creative)
`top_p`	Nucleus sampling — cumulative probability mass
`top_k`	Limit selection to top K tokens
`max_tokens`	Maximum tokens to generate
`stop_sequences`	List of strings that stop generation when matched

Count tokens

Estimate the token count of a request without running inference:

$ curl http://localhost:8080/v1/messages/count_tokens \
>   -H "Content-Type: application/json" \
>   -d '{"model": "qwen3.5:35b", "messages": [...]}'

Returns {"input_tokens": 142}.

Validate

Dry-run a request to check it is valid without generating a response:

$ curl http://localhost:8080/v1/messages/validate \
>   -H "Content-Type: application/json" \
>   -d '{"model": "qwen3.5:35b", "messages": [...]}'

Returns {"valid": true} or a validation error. Useful for checking tool schemas, context filters, and model availability before executing.

Async messages

For long-running generations, fire-and-forget patterns, or background processing, use the async API. See Async messages for the full reference.