Streaming
Token360 supports streaming responses for chat completions, delivering tokens in real-time as the model generates them. This provides a much better user experience for interactive applications.
How Streaming Works
When you set stream: true in your chat completion request, Token360 returns the response as Server-Sent Events (SSE) instead of a single JSON object. Each event contains a small chunk of the response, typically one or a few tokens.
Basic Usage
1curl -N https://api.token360.ai/v1/chat/completions \
2 -H "Authorization: Bearer sk-your-api-key" \
3 -H "Content-Type: application/json" \
4 -d '{
5 "model": "glm-5.1",
6 "messages": [{"role": "user", "content": "Write a haiku about coding."}],
7 "stream": true
8 }'SSE Response Format
Each SSE event is a line starting with data: followed by a JSON object:
data: {"id":"your-chat-completion-id","object":"chat.completion.chunk","created":1677652288,"model":"glm-5.1","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"your-chat-completion-id","object":"chat.completion.chunk","created":1677652288,"model":"glm-5.1","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"your-chat-completion-id","object":"chat.completion.chunk","created":1677652288,"model":"glm-5.1","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: {"id":"your-chat-completion-id","object":"chat.completion.chunk","created":1677652288,"model":"glm-5.1","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]Key Fields
delta.role— Always"assistant". Token360 includes it in every chunk for consistency.delta.content— The text token(s) for this chunk. May benullor absent (e.g., in the final chunk).finish_reason—nullwhile generating, then"stop"(or"length","tool_calls") in the final chunk.usage— Included in the final chunk before[DONE]with token counts and cost details.provider— Name of the model provider that served the request (e.g.,"Parasail").[DONE]— Signals the end of the stream.
Usage Information
Token usage is included in the final chunk (when supported by the provider):
1{
2 "usage": {
3 "prompt_tokens": 12,
4 "completion_tokens": 45,
5 "total_tokens": 57
6 }
7}Performance Benefits
Streaming is recommended for:
- Chat interfaces — Users see responses appear in real-time.
- Long-form generation — Avoid timeout issues with lengthy outputs.
- Time-to-first-token (TTFT) — Users see output begin within milliseconds.
Token360 monitors TTFT and total latency for all streaming requests, visible in the console analytics dashboard.
Error Handling in Streams
If an error occurs mid-stream, it will be delivered as a final SSE event:
data: {"error":{"message":"Rate limit exceeded","type":"rate_limit_error","code":"rate_limit"}}
data: [DONE]Always check for error objects in your stream processing logic.
Previous
Overview
Next
Error Handling