For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Contact usJoin the Discord
ManualAPI GuideAPI Reference
  • Overview
    • API Reference
    • Client libraries
  • API Reference
      • POSTInitiate Async Chat Stream
      • GETObserve Async Chat Stream Events
      • GETGet Async Chat Stream Status
      • POSTCancel Async Chat Stream
      • DELDelete Async Chat Stream
LogoLogo
Contact usJoin the Discord
API ReferenceAsync Messages

Initiate Async Chat Stream

POST
https://host.com/v1/messages/async
POST
/v1/messages/async
$curl -X POST https://host.com/v1/messages/async \
> -H "Content-Type: application/json" \
> -d '{
> "model": "model",
> "messages": [
> {
> "role": "assistant",
> "content": "content"
> }
> ],
> "max_tokens": 1
>}'
1{
2 "message_id": "msg_async_12345",
3 "status": "pending",
4 "message": "Request initiated successfully"
5}
Initiate an asynchronous chat completion stream. This endpoint starts an asynchronous chat completion process that streams events. Unlike synchronous chat, this endpoint returns immediately with a message_id that can be used to observe the stream progress. Key Features: - Asynchronous Processing: Non-blocking request handling with immediate response - Stream Observation: Use returned message_id to observe real-time events - Works exactly like synchronous chat, but in an async manner Notes: - Optional message_id query parameter for custom stream identification - Stream events follow the same format as synchronous chat responses - Stream status can be monitored via status endpoint
Was this page helpful?
Previous

Observe Async Chat Stream Events

Next
Built with

Initiate an asynchronous chat completion stream.

This endpoint starts an asynchronous chat completion process that streams events. Unlike synchronous chat, this endpoint returns immediately with a message_id that can be used to observe the stream progress.

Key Features:

  • Asynchronous Processing: Non-blocking request handling with immediate response
  • Stream Observation: Use returned message_id to observe real-time events
  • Works exactly like synchronous chat, but in an async manner

Notes:

  • Optional message_id query parameter for custom stream identification
  • Stream events follow the same format as synchronous chat responses
  • Stream status can be monitored via status endpoint

Query parameters

message_idstringOptional
Optional custom identifier for the stream. If not provided, a unique ID will be generated automatically.

Request

This endpoint expects an object.
modelstringRequiredDefaults to default
Model identifier or alias.
messageslist of objectsRequired
Conversation messages for the request.
max_tokensintegerRequired>=1
Maximum number of tokens to generate in the response.
systemlist of objectsOptional

System prompt input. Accepts str, list[str], System, list[System], or null. It is normalized internally to list[System].

toolslist of objectsOptional
Optional tool definitions.
thinkingobjectOptional
Thinking configuration.
tool_choiceobjectOptional
Tool selection policy.
output_configobjectOptional
Optional output configuration options.
cache_controlobjectOptional

Optional request-level cache control.

streambooleanOptional
Whether to stream the response back to the client.
tool_contextlist of objectsOptional
Context to provide to the tools, such as documents, databases connection strings, or data relevant to tool usage.
mcp_serverslist of objectsOptional
List of MCP servers to use for tool retrieval. Each server can have its own configuration.
containerstringOptional
Container identifier for reuse across requests.
response_formatobjectOptional

Deprecated response format. Use output_config.format instead.

priorityintegerOptional
Priority of the request, used for prioritizing responses.
seedintegerOptional
Random seed for reproducibility.
min_pdoubleOptional
Minimum probability threshold for token selection. Tokens with probability below this value are filtered out.
top_pdoubleOptional0-1
Nucleus sampling parameter. Only tokens with cumulative probability up to this value are considered.
temperaturedoubleOptional0-1
Controls randomness in generation. Higher values make output more random, lower values more deterministic.
top_kintegerOptional>=0
Limits token selection to the top K most likely tokens at each step.
repetition_penaltydoubleOptional
Penalty applied to tokens that have already appeared in the sequence to reduce repetition.
presence_penaltydoubleOptional
Penalty applied based on whether a token has appeared in the text, encouraging topic diversity.
frequency_penaltydoubleOptional
Penalty applied based on how frequently a token appears in the text, reducing repetitive content.
stop_sequenceslist of stringsOptional
Custom stop sequences that stop generation when matched.
metadataobjectOptional

Request metadata (for example, user_id).

service_tierenumOptional

Service tier preference (for example, “auto” or “standard_only”).

Allowed values:
inference_geostringOptional
Geographic region hint for inference processing.
correlation_idstringOptional
Correlation ID for tracking the request across systems.
maximum_loaded_skillsintegerOptional>=1
Optional cap for concurrently loaded skills in a conversation. When exceeded, the oldest loaded skill is evicted.
context_managementanyOptional

Response

This endpoint returns an object.
message_idstring
Unique identifier for the initiated stream
statusenum

Initial status of the stream (typically ‘pending’)

messagestringDefaults to Request initiated successfully
Confirmation message for successful stream initiation

Errors

401
Unauthorized
422
Unprocessable Entity