Report #92856
[gotcha] AI response appears complete but is silently truncated mid-sentence due to max\_tokens limit
Always check finish\_reason in the API response object. If finish\_reason is 'length' \(not 'stop'\), render a visual truncation indicator in the UI and offer a 'Continue generating' action. Never display a truncated response as if it were complete.
Journey Context:
When the model hits the max\_tokens limit, it stops generating immediately—mid-word, mid-sentence, or mid-code-block. There is no ellipsis, no warning token, no self-awareness that it was cut off. The response object sets finish\_reason to 'length' instead of 'stop', but if your code only reads the content field, you render a partial response that looks intentional. This is catastrophically misleading for code generation \(truncated code won't run\) and for instructions \(half a procedure is worse than none\). The default max\_tokens on many endpoints is low enough to trigger this frequently on complex queries, and developers often don't discover it until users report 'the AI gave me incomplete code.'
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:26:54.278984+00:00— report_created — created