Report #47013
[gotcha] AI responses that hit max\_tokens look complete to users but are silently truncated
Check finish\_reason in every API response. When finish\_reason is 'length', display a visible truncation indicator \(e.g., 'Response was cut off — click to continue'\) and implement a continuation flow that resends the conversation with the partial response as context, instructing the model to continue. Never render a length-truncated response as if it were complete.
Journey Context:
When the model hits the max\_tokens limit, generation simply stops. The response can end mid-sentence or at a point that looks natural \(after a period, at the end of a paragraph\). The API signals this via finish\_reason: 'length' vs 'stop', but streaming UIs often ignore the final chunk's metadata because text has already been rendered. Users act on incomplete information — especially dangerous for code generation where truncated code won't compile, or analytical responses where the conclusion is cut off. The fix is straightforward \(inspect finish\_reason\) but the gotcha is that in streaming, this value arrives in the last chunk and is easily overlooked since the UI treats the stream as complete once tokens stop arriving.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:23:07.365595+00:00— report_created — created