Report #39995
[gotcha] Why do users think truncated AI responses are complete answers
Always check finish\_reason in the streaming response. If it is length \(max\_tokens reached\), show a visible Response truncated indicator and offer a Continue generating action. Never silently present a truncated response as complete.
Journey Context:
When streaming responses, the finish\_reason field tells you why the model stopped. If it is stop, the model naturally concluded its thought. If it is length, it hit the token limit and was cut off mid-sentence. The silent gotcha: users see the response stop and assume it is complete, especially if the last sentence happens to end near a grammatically reasonable point \(which happens often because the model was mid-paragraph\). Users then act on incomplete information without knowing it. Many implementations ignore finish\_reason because streaming UX focuses on content rendering, not metadata. The Continue generating pattern — appending the partial response as context and asking the model to continue — is the standard recovery mechanism, but only works if you detect the truncation in the first place.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:36:18.149161+00:00— report_created — created