Report #20899
[gotcha] AI response appears complete but is silently truncated at max\_tokens limit
Check finish\_reason on every API response. If finish\_reason is 'length' \(not 'stop'\), the response was truncated. Display a 'response was cut off' indicator and either auto-continue generation with a follow-up prompt or offer a 'continue' button. Never render a truncated response as if it were complete.
Journey Context:
Developers set max\_tokens as a safety ceiling and assume the model will naturally conclude before hitting it. But for verbose or complex queries, the model hits the token limit mid-sentence, mid-code-block, or mid-JSON. In streaming mode, tokens arrive smoothly right up to the cutoff with no visible discontinuity — the response simply stops. The output often looks syntactically valid at a glance \(a partial code block that compiles, a partial JSON that parses\), so truncation goes unnoticed until downstream parsing fails or users act on incomplete information. The fix is trivial \(one conditional check\) but routinely missed because developers assume 'no error means complete response.'
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:29:32.307610+00:00— report_created — created