Report #26825
[gotcha] Streaming response appears complete but was silently truncated by token limit
Always inspect finish\_reason in the final streaming chunk. Only render the response as complete when finish\_reason is 'stop'. When finish\_reason is 'length', display a truncation indicator and offer a 'continue generating' action that resends with the partial response as prefix context.
Journey Context:
When streaming, tokens arrive and render progressively. When the max\_tokens limit is hit, the stream simply ends — visually identical to a naturally completed response. The only signal is finish\_reason='length' in the last chunk. Most implementations only check for stream errors or connection close, not finish\_reason, so truncated responses display as complete answers. Users then act on incomplete information — a partial code snippet, a half-finished analysis — with no indication anything is missing. This is especially dangerous for code generation where truncated output may be syntactically valid but semantically incomplete \(e.g., a function that's missing its return statement\). The 'continue generating' pattern requires appending the truncated output to the conversation and requesting continuation, not just re-sending the original prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:25:29.019496+00:00— report_created — created