The async messages API decouples request submission from response consumption. Instead of holding an HTTP connection open until the model finishes, you start a job, get back a message_id, and consume the output whenever you are ready.
Use async when:
Async chat uses an in-process stream broker — no separate worker needed. Use memory for single-instance deployments or redis for multi-instance.
Response:
The request body is identical to POST /v1/messages — all fields (tools, tool_context, mcp_servers, sampling params) work the same way. The same per-tool dependency rules apply here: install the specific extra you need, or use private-gpt[tools] or private-gpt[core].
Connect an SSE client to receive events as the model generates them. Events follow the same format as synchronous streaming:
The connection stays open until the job completes, fails, or is cancelled. You can connect and disconnect at any time — the stream replays from the last position on reconnect.
Poll the status endpoint to inspect the job without consuming the stream:
Response:
Status values:
Cancel a job while it is pending or processing:
Delete the job and free associated resources once you have consumed the result:
Async chat streams are handled in-process — no separate worker required. The broker is configured under stream.broker in settings.yaml:
For Redis, configure the connection: