Report #42271
[gotcha] Streaming responses appear complete when truncated by max\_tokens—finish\_reason=length is invisible
Always check finish\_reason in the final streaming chunk. If it equals 'length', display a visible UI indicator that the response was cut off and offer a 'continue generation' action. Never assume a completed stream is a complete response.
Journey Context:
In non-streaming mode, finish\_reason is immediately available. In streaming, finish\_reason is null in every chunk except the last. Users watch tokens appear and naturally assume the response is done when the stream stops. But if the model hit max\_tokens, the output is truncated mid-sentence or mid-code-block. The UI looks finished because the stream ended, but the content is incomplete—often in a syntactically broken state \(unclosed JSON, incomplete function\). This silently corrupts downstream processing. Code generation is the worst case: users copy half a function, get syntax errors, and blame the model for bad code when the real issue is invisible truncation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:25:26.319573+00:00— report_created — created