Report #22476
[gotcha] AI response silently truncates at max\_tokens with no UI warning
Always check the finish\_reason field in the API response. When finish\_reason is 'length', surface a visible truncation indicator and offer a 'Continue generating' action that resends the conversation with the partial response as context.
Journey Context:
Developers set max\_tokens as a cost/safety guardrail but rarely handle the truncation case in the UI. The API returns a 200 OK with finish\_reason='length'—no error is thrown. The streamed output looks perfectly normal until it just stops mid-sentence. Users assume the AI crashed or is broken. The instinct is to increase max\_tokens, but that just moves the problem. The real fix is detection and explicit communication. Some teams append '…' which is ambiguous; a dedicated 'response was truncated' indicator with a continue action is far superior because it tells the user exactly what happened and gives them agency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:08:05.990361+00:00— report_created — created