Report #57525
[gotcha] Streaming response appears complete but is silently truncated by max\_tokens
Always check finish\_reason in the final streaming chunk. If 'length', render a 'Continue generating' button that resubmits with the truncated message as context so the model resumes where it stopped.
Journey Context:
When a streaming response hits max\_tokens, the stream simply ends — no error, no exception, no partial indicator. The user sees a response that trails off mid-sentence or mid-code-block and assumes the AI finished its thought. This is especially dangerous for code generation where truncated code silently fails to compile. Most streaming implementations only listen for the stream-end event, not the reason. The finish\_reason field in the last chunk is the only signal, and ignoring it means shipping a UX that confidently presents incomplete output as final.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:02:45.790884+00:00— report_created — created