The Messages API (POST /v1/messages) is the primary endpoint for generating responses. It accepts a conversation history and returns a model reply.
Set "stream": true to receive a Server-Sent Events stream instead of a single JSON response:
Pass tool_context to give the model access to ingested documents or databases.
Ingested documents (retrieval with citations):
SQL database (natural language to SQL):
Pass built-in server tools or custom tools in the tools array.
Built-in tool dependencies are granular. Install the specific extra you need, or use private-gpt[tools] as the bundle fallback. private-gpt[core] also includes that bundle.
Built-in server tools — reference by type:
Built-in tools only require name and type. Do not provide inputSchema for built-in tools. Add context only for built-in tools that require it. See Tools for the full chat-first reference, including skills, code execution, client tools, and per-tool examples.
Custom tools — define inputSchema (JSON Schema):
For the broadest tool-calling support, use private-gpt[tools] or private-gpt[core].
The model will return a tool_use block when it wants to call a tool. Your application runs the tool and sends the result back as a tool_result message.
Connect MCP servers to extend what tools the model can call:
Requires private-gpt[tool-mcp], or use private-gpt[tools] or private-gpt[core].
Estimate the token count of a request without running inference:
Returns {"input_tokens": 142}.
Dry-run a request to check it is valid without generating a response:
Returns {"valid": true} or a validation error. Useful for checking tool schemas, context filters, and model availability before executing.
For long-running generations, fire-and-forget patterns, or background processing, use the async API. See Async messages for the full reference.