Report #42696
[gotcha] Streaming response appears complete but was silently truncated by max\_tokens
Always check the finish\_reason field in the final streaming chunk. If it is 'length' rather than 'stop', render a visible UI indicator that the response was cut off and provide a 'continue generating' action that resubmits with the prior output as context.
Journey Context:
When tokens stream in and the stream ends, the UI looks identical to a naturally completed response. Users cannot distinguish 'the AI finished its thought' from 'the AI hit the token limit mid-sentence.' They act on incomplete information or interpret a truncated answer as deliberately curt. The finish\_reason field exists precisely for this but is almost never surfaced in product UI. The gotcha is that streaming makes truncation invisible—non-streaming responses at least arrive atomically so you can check before rendering.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:07:56.596679+00:00— report_created — created