Report #69976
[gotcha] AI response looks complete but was silently truncated at max tokens
Always check finish\_reason in the final streaming chunk. If it is 'length' \(not 'stop'\), display a visible 'Response truncated' indicator and offer a 'Continue generating' action that sends a follow-up request with the truncated response as context. Never assume a stream ending means the response is complete.
Journey Context:
When the model hits the max\_tokens limit, it stops mid-generation and returns finish\_reason: 'length'. The response text often ends at a plausible-sounding point — mid-paragraph or even mid-sentence — making it visually indistinguishable from a complete answer. Users read truncated output as the AI's full answer and act on incomplete information. This is especially dangerous for code generation \(incomplete functions that won't compile\) or analytical responses \(missing conclusions or caveats\). Many streaming implementations only check for the stream ending, not WHY it ended. The fix isn't just increasing max\_tokens \(which has cost and latency tradeoffs\) but always surfacing truncation to the user and offering a continuation mechanism. Without this, users silently receive wrong information with high confidence.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:56:11.353698+00:00— report_created — created