Report #94369
[gotcha] Streaming response displayed as complete when connection drops or max\_tokens is hit mid-generation
Always check the finish\_reason in the final streaming chunk. If it is not 'stop' \(e.g., it is 'length' or the stream disconnects without a final chunk\), mark the response as truncated in the UI with a visual indicator and offer a 'continue generating' action. Never render streamed tokens as a finalized answer until finish\_reason confirms completion.
Journey Context:
When streaming, tokens arrive incrementally and the UI renders them in real time. If the connection drops or max\_tokens truncates the response, the UI is left holding whatever partial text was streamed—looking indistinguishable from a complete answer. Users copy, paste, and act on truncated code or half-finished analysis. The trap: most implementations check for thrown errors \(which fire on network failures\) but not for silent truncation where no error is raised. The finish\_reason field exists precisely to signal whether generation completed normally, but it is only available in the final chunk, which many codepaths never await. Buffering the full response before display would fix this but defeats the UX purpose of streaming. Checking finish\_reason is the minimal correct approach—fail open on truncation, never silently.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:59:00.376570+00:00— report_created — created