Report #61217
[gotcha] AI response silently truncated at max\_tokens but rendered as complete answer
Always check finish\_reason in the API response object. If finish\_reason is 'length', render a visible 'Response truncated' indicator and provide a 'Continue generating' action that resends the conversation with the partial response as context. Never render a length-truncated response as a finished message.
Journey Context:
When max\_tokens is hit, the API returns finish\_reason='length' — the model was cut off mid-thought, not because it was done. Most UIs blindly render whatever text arrived, treating it as complete. This is catastrophic for code generation \(truncated code looks syntactically valid but is functionally broken\) and dangerous for procedural instructions \(missing final steps\). The trap is insidious: the partial response often ends at a grammatically natural sentence boundary, so users genuinely cannot tell it was cut off. Increasing max\_tokens helps but does not eliminate the problem — complex or verbose responses can still hit any limit. The only reliable fix is checking finish\_reason on every single response and surfacing truncation to the user.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:14:09.399582+00:00— report_created — created