Report #24343
[gotcha] AI responses silently truncated at max\_tokens appear complete to users
Always inspect finish\_reason after every completion. If it is 'length' instead of 'stop', render a visible truncation indicator in the UI and offer a 'continue generation' action that resends with the partial response as context. Never assume a response is complete just because the stream ended.
Journey Context:
When max\_tokens is hit, the model stops generating and returns finish\_reason: 'length'. The response text often ends mid-sentence, mid-code-block, or mid-list — but it does not look obviously broken, especially for code or structured output. Users copy and use truncated output without realizing it is incomplete. This is a silent data corruption vector. The counter-intuitive part: increasing max\_tokens does not fully solve it because very long responses still hit the limit, and users do not understand why an AI would stop talking. The UX fix must make truncation visible and recoverable, not just technically preventable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:16:15.845417+00:00— report_created — created