Streaming

Token360 supports streaming responses for chat completions, delivering tokens in real-time as the model generates them. This provides a much better user experience for interactive applications.

How Streaming Works

When you set stream: true in your chat completion request, Token360 returns the response as Server-Sent Events (SSE) instead of a single JSON object. Each event contains a small chunk of the response, typically one or a few tokens.

Basic Usage

1curl -N https://api.token360.ai/v1/chat/completions \
2  -H "Authorization: Bearer sk-your-api-key" \
3  -H "Content-Type: application/json" \
4  -d '{
5    "model": "glm-5.1",
6    "messages": [{"role": "user", "content": "Write a haiku about coding."}],
7    "stream": true
8  }'

SSE Response Format

Each SSE event is a line starting with data: followed by a JSON object:

data: {"id":"your-chat-completion-id","object":"chat.completion.chunk","created":1677652288,"model":"glm-5.1","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"your-chat-completion-id","object":"chat.completion.chunk","created":1677652288,"model":"glm-5.1","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"your-chat-completion-id","object":"chat.completion.chunk","created":1677652288,"model":"glm-5.1","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"your-chat-completion-id","object":"chat.completion.chunk","created":1677652288,"model":"glm-5.1","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Key Fields

  • delta.role — Always "assistant". Token360 includes it in every chunk for consistency.
  • delta.content — The text token(s) for this chunk. May be null or absent (e.g., in the final chunk).
  • finish_reasonnull while generating, then "stop" (or "length", "tool_calls") in the final chunk.
  • usage — Included in the final chunk before [DONE] with token counts and cost details.
  • provider — Name of the model provider that served the request (e.g., "Parasail").
  • [DONE] — Signals the end of the stream.

Usage Information

Token usage is included in the final chunk (when supported by the provider):

JSON
1{
2  "usage": {
3    "prompt_tokens": 12,
4    "completion_tokens": 45,
5    "total_tokens": 57
6  }
7}

Performance Benefits

Streaming is recommended for:

  • Chat interfaces — Users see responses appear in real-time.
  • Long-form generation — Avoid timeout issues with lengthy outputs.
  • Time-to-first-token (TTFT) — Users see output begin within milliseconds.

Token360 monitors TTFT and total latency for all streaming requests, visible in the console analytics dashboard.

Error Handling in Streams

If an error occurs mid-stream, it will be delivered as a final SSE event:

data: {"error":{"message":"Rate limit exceeded","type":"rate_limit_error","code":"rate_limit"}}

data: [DONE]

Always check for error objects in your stream processing logic.

Was this page helpful?