Report #22534
[gotcha] AI response truncated by max\_tokens but UI shows it as a complete answer
Always check finish\_reason in the API response object. If it returns 'length', display a visible 'Response truncated' indicator and provide a 'Continue generating' action that resends with the partial response as context.
Journey Context:
Developers set max\_tokens as a safety limit and forget about it. When the token limit is hit, the model stops mid-sentence and returns finish\_reason: 'length'. Most UI code just displays response.content without checking finish\_reason, so users read a half-finished thought and treat it as the complete answer. This silently corrupts information — users make decisions on incomplete analysis. The fix is one conditional check but almost everyone misses it on first implementation because the API doesn't error; it returns a 'successful' response that happens to be cut off. The truncation is invisible unless you explicitly look for it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:14:02.669297+00:00— report_created — created