For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Contact usJoin the Discord
ManualAPI GuideAPI Reference
  • Core API
    • Messages
    • Async messages
    • Embeddings
  • Tools & Skills
    • Built-in Tools
    • Skills
  • Documents
    • Ingestion
    • Async ingestion
LogoLogo
Contact usJoin the Discord
On this page
  • Basic usage
  • Streaming
  • Context — documents, databases, web
  • Tools
  • MCP servers
  • Sampling parameters
  • Count tokens
  • Validate
  • Async messages
Core API

Messages

Was this page helpful?

Async messages

Next
Built with

The Messages API (POST /v1/messages) is the primary endpoint for generating responses. It accepts a conversation history and returns a model reply.


Basic usage

$curl http://localhost:8080/v1/messages \
> -H "Content-Type: application/json" \
> -d '{
> "model": "qwen3.5:35b",
> "messages": [
> {"role": "user", "content": "Explain retrieval-augmented generation."}
> ]
> }'

Streaming

Set "stream": true to receive a Server-Sent Events stream instead of a single JSON response:

$curl http://localhost:8080/v1/messages \
> -H "Content-Type: application/json" \
> -d '{"model": "qwen3.5:35b", "stream": true, "messages": [...]}'

Context — documents, databases, web

Pass tool_context to give the model access to ingested documents or databases.

Ingested documents (retrieval with citations):

1{
2 "model": "qwen3.5:35b",
3 "messages": [{"role": "user", "content": "Summarise the contract."}],
4 "tool_context": [
5 {
6 "type": "ingested_artifact",
7 "context_filter": {
8 "collection": "my-collection",
9 "artifacts": ["artifact-id-1"]
10 }
11 }
12 ]
13}

SQL database (natural language to SQL):

1{
2 "tool_context": [
3 {
4 "type": "sql_database",
5 "connection_string": "postgresql://user:pass@localhost:5432/mydb",
6 "description": "Sales database"
7 }
8 ]
9}

Tools

Pass built-in server tools or custom tools in the tools array.

Built-in tool dependencies are granular. Install the specific extra you need, or use private-gpt[tools] as the bundle fallback. private-gpt[core] also includes that bundle.

Built-in server tools — reference by type:

Built-in tools only require name and type. Do not provide inputSchema for built-in tools. Add context only for built-in tools that require it. See Tools for the full chat-first reference, including skills, code execution, client tools, and per-tool examples.

1{
2 "tools": [
3 {"name": "search_docs", "type": "semantic_search_v1"},
4 {"name": "analyze_sales", "type": "tabular_analysis_v1"},
5 {"name": "query_db", "type": "database_query_v1"},
6 {"name": "search_web", "type": "web_search_v1"},
7 {"name": "fetch_url", "type": "web_fetch_v1"},
8 {"name": "skills", "type": "skills_v1"},
9 {"name": "code_execution", "type": "code_execution_v1"}
10 ]
11}

Custom tools — define inputSchema (JSON Schema):

For the broadest tool-calling support, use private-gpt[tools] or private-gpt[core].

1{
2 "tools": [
3 {
4 "name": "get_weather",
5 "description": "Get current weather for a city",
6 "inputSchema": {
7 "type": "object",
8 "properties": {
9 "city": {"type": "string"}
10 },
11 "required": ["city"]
12 }
13 }
14 ]
15}

The model will return a tool_use block when it wants to call a tool. Your application runs the tool and sends the result back as a tool_result message.


MCP servers

Connect MCP servers to extend what tools the model can call:

Requires private-gpt[tool-mcp], or use private-gpt[tools] or private-gpt[core].

1{
2 "mcp_servers": [
3 {
4 "name": "my-mcp",
5 "url": "https://my-mcp-server.example.com",
6 "authorization_token": "token"
7 }
8 ]
9}

Sampling parameters

ParameterDescription
temperatureRandomness (0 = deterministic, higher = more creative)
top_pNucleus sampling — cumulative probability mass
top_kLimit selection to top K tokens
max_tokensMaximum tokens to generate
stop_sequencesList of strings that stop generation when matched

Count tokens

Estimate the token count of a request without running inference:

$curl http://localhost:8080/v1/messages/count_tokens \
> -H "Content-Type: application/json" \
> -d '{"model": "qwen3.5:35b", "messages": [...]}'

Returns {"input_tokens": 142}.


Validate

Dry-run a request to check it is valid without generating a response:

$curl http://localhost:8080/v1/messages/validate \
> -H "Content-Type: application/json" \
> -d '{"model": "qwen3.5:35b", "messages": [...]}'

Returns {"valid": true} or a validation error. Useful for checking tool schemas, context filters, and model availability before executing.


Async messages

For long-running generations, fire-and-forget patterns, or background processing, use the async API. See Async messages for the full reference.