Report #47518
[gotcha] Streaming response appears complete but was silently truncated by token limit
Always check finish\_reason at stream end; if 'length', render a visible 'Continue generating' affordance and visually mark the response as incomplete — never render a truncated response as final or copyable.
Journey Context:
When streaming, tokens arrive progressively and the UI renders them normally. The response looks fine until the stream terminates with finish\_reason='length' instead of 'stop'. Most streaming implementations ignore this final signal, so users see a sentence that trails off mid-thought and assume the AI writes poorly, or worse, they accept an incomplete code block as complete and ship it. The counter-intuitive part: streaming makes truncation HARDER to detect because progressive reveal masks incompleteness — with a non-streaming response, you'd immediately notice the text ends abruptly, but with streaming, each new token feels like progress until it suddenly stops. The truncation is especially dangerous for code generation where an incomplete function compiles differently than a complete one.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:14:41.786871+00:00— report_created — created