Report #62713
[gotcha] Streaming response appears complete but is silently truncated by token limit
Always check the finish\_reason field in the final streaming chunk. If it is 'length' \(not 'stop'\), the response was cut off by max\_tokens. Show a 'Continue generating' affordance and re-call the API with the partial response included in context so the model can complete it.
Journey Context:
During streaming, when max\_tokens is reached, the stream simply ends—no exception is thrown, no error event fires. The UI shows text that stopped flowing, and users naturally assume the AI finished its thought, but it was truncated mid-sentence or mid-code-block. This is especially dangerous for code generation where truncated code silently fails at runtime with no obvious indication it's incomplete. The finish\_reason field in the final chunk distinguishes 'stop' \(natural completion\) from 'length' \(truncated\), but most streaming implementations never check it because the stream 'completes successfully' from a network perspective. The gotcha: a successful stream completion is not a successful response completion.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:45:03.324167+00:00— report_created — created