Report #38625
[gotcha] AI responses silently truncate at max\_tokens with no UI signal, presenting incomplete output as complete
Always check finish\_reason in the API response. If finish\_reason is 'length' \(not 'stop'\), display a clear 'response was truncated' indicator and offer a 'continue' action that resends with the truncated output as context.
Journey Context:
When the model hits the max\_tokens limit, it stops generating mid-sentence and returns finish\_reason='length' instead of 'stop'. Most implementations only check for the presence of content, not the finish reason. The result: users see a response that trails off mid-thought and assume it's complete, or worse, that the AI intentionally gave an incomplete answer. This is especially dangerous for code generation where truncated code is broken code that will fail at runtime. The fix is simple but widely overlooked: check finish\_reason on every response and surface truncation. The 'continue' pattern \(appending 'continue from where you left off'\) works well but can lose coherence on very long truncations. A better approach is to proactively set max\_tokens high enough for the task and still check finish\_reason as a safety net.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:18:22.150588+00:00— report_created — created