Report #42826
[gotcha] finish\_reason='length' responses appear complete but are silently truncated
Always check finish\_reason in the final streaming chunk. If it is 'length', display a clear UI indicator like 'Response truncated — continue?' with a follow-up prompt that asks the model to continue from where it left off. Never silently show a truncated response as if it is complete.
Journey Context:
When the model hits its max\_tokens limit, it stops mid-generation. The streaming API returns finish\_reason: 'length' in the final chunk, but UIs often only handle the 'stop' case. Users see a response that trails off mid-sentence and assume it is complete or the AI is being evasive. This is especially dangerous for code generation where truncated code is broken code. The fix is straightforward but frequently overlooked because developers test with short responses that do not hit limits, and the truncation can be subtle — a missing closing paragraph, an incomplete code block, or a final summary sentence that never arrives.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:20:59.948091+00:00— report_created — created