Report #66816
[gotcha] Streaming AI response renders as complete when generation was interrupted by token limit or content filter
Always check finish\_reason in the final streaming chunk. Surface incomplete responses with visual indicators \(truncation warning, 'response cut off' banner, greyed-out trailing text\) when finish\_reason is 'length' or 'content\_filter'. Never silently present partial output as a complete answer.
Journey Context:
When using SSE streaming, tokens render incrementally on screen. If generation stops prematurely — hitting max\_tokens, triggering a content filter, or dropping the connection — the already-visible text looks like a complete, finished answer. Users will read, copy, and act on truncated code or half-finished explanations without any signal that something is missing. The finish\_reason field in the final chunk disambiguates: 'stop' means natural completion, 'length' means token-limit truncation, 'content\_filter' means safety-blocked. Most frontend implementations ignore this field entirely, creating a silent data-integrity bug. This is especially dangerous for code generation where truncated code is syntactically invalid but may look plausible enough to paste into a codebase.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:37:51.497763+00:00— report_created — created