For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Contact usJoin the Discord
ManualAPI GuideAPI Reference
  • Core API
    • Messages
    • Async messages
    • Embeddings
  • Tools & Skills
    • Built-in Tools
    • Skills
  • Documents
    • Ingestion
    • Async ingestion
LogoLogo
Contact usJoin the Discord
On this page
  • Lifecycle
  • Start a job
  • Stream the output
  • Check status
  • Cancel a job
  • Clean up
  • Stream broker
Core API

Async messages

Was this page helpful?
Previous

Embeddings

Next
Built with

The async messages API decouples request submission from response consumption. Instead of holding an HTTP connection open until the model finishes, you start a job, get back a message_id, and consume the output whenever you are ready.

Use async when:

  • Generation is long and you don’t want to hold a connection open.
  • You need to fan out multiple requests and collect results later.
  • You want to let a background worker process the job while the caller does other work.

Async chat uses an in-process stream broker — no separate worker needed. Use memory for single-instance deployments or redis for multi-instance.


Lifecycle

POST /v1/messages/async → message_id (pending)
│
▼
GET /v1/messages/async/{id}/stream ←─ SSE stream of completion events
GET /v1/messages/async/{id}/status ←─ poll status at any time
│
▼
POST /v1/messages/async/{id}/cancel (optional — while processing)
DELETE /v1/messages/async/{id}/delete (clean up when done)

Start a job

$curl -X POST http://localhost:8080/v1/messages/async \
> -H "Content-Type: application/json" \
> -d '{
> "model": "qwen3.5:35b",
> "messages": [{"role": "user", "content": "Write a detailed report on..."}]
> }'

Response:

1{"message_id": "msg_01abc..."}

The request body is identical to POST /v1/messages — all fields (tools, tool_context, mcp_servers, sampling params) work the same way. The same per-tool dependency rules apply here: install the specific extra you need, or use private-gpt[tools] or private-gpt[core].


Stream the output

Connect an SSE client to receive events as the model generates them. Events follow the same format as synchronous streaming:

$curl http://localhost:8080/v1/messages/async/msg_01abc.../stream

The connection stays open until the job completes, fails, or is cancelled. You can connect and disconnect at any time — the stream replays from the last position on reconnect.


Check status

Poll the status endpoint to inspect the job without consuming the stream:

$curl http://localhost:8080/v1/messages/async/msg_01abc.../status

Response:

1{
2 "message_id": "msg_01abc...",
3 "status": "processing",
4 "created_at": "2026-05-26T10:00:00Z",
5 "updated_at": "2026-05-26T10:00:05Z",
6 "completed_at": null,
7 "error_message": null
8}

Status values:

StatusMeaning
pendingJob queued, worker not yet started
processingWorker is actively generating
completedGeneration finished successfully
failedGeneration failed — check error_message
cancelledJob was cancelled before completion
errorInternal error

Cancel a job

Cancel a job while it is pending or processing:

$curl -X POST http://localhost:8080/v1/messages/async/msg_01abc.../cancel

Clean up

Delete the job and free associated resources once you have consumed the result:

$curl -X DELETE http://localhost:8080/v1/messages/async/msg_01abc.../delete

Stream broker

Async chat streams are handled in-process — no separate worker required. The broker is configured under stream.broker in settings.yaml:

ModeWhen to use
memorySingle-instance deployments and local development (default)
redisMulti-instance or production — streams are shared across processes
1stream:
2 broker: memory # or redis

For Redis, configure the connection:

1redis:
2 host: ${PGPT_REDIS_HOST:localhost:6379}
3 username: ${PGPT_REDIS_USERNAME:}
4 password: ${PGPT_REDIS_PASSWORD:}
5 database: ${PGPT_REDIS_DATABASE:0}