Technology

Streaming

Streaming is the LLM API pattern of sending tokens to the client as they're generated, instead of waiting for the full response. Materially improves perceived latency for chat / drafting use cases.

More detail

Without streaming: user clicks submit, sees nothing for 5 seconds, then full response appears. With streaming: tokens stream within 200-500ms. Implementation: server-sent events (SSE) or WebSocket. Aiprosol's chat widget uses SSE streaming for the Groq-backed Arora chat.

More detail

Related terms

More detail

Related terms