Report #51684
[gotcha] max\_tokens truncation produces responses that look complete but are silently cut off mid-reasoning
Always check finish\_reason in the API response object. If finish\_reason is 'length' \(not 'stop'\), the response was truncated by the token limit. Display a clear indicator like '\[Response truncated — tap to continue\]' and offer a 'continue generating' action that resubmits with the prior output as context. Never render a length-truncated response as a finalized, complete message.
Journey Context:
When an AI hits the max\_tokens limit, generation stops mid-thought. But because language models often produce text with natural sentence boundaries near token limits, the truncated output can look like a complete thought. A response might end 'Therefore, the recommended approach is' and get cut off, or it might end 'This is why option B is' — which looks like it could be complete but was about to say something critical. Even more dangerously, code generation can be truncated mid-function, producing code that compiles but is functionally incomplete. The trap: there is no error thrown, no exception, no visible break. The only signal is the finish\_reason field in the API response, which many implementations never check. Users read truncated responses as complete and act on half-formed reasoning or broken code. The fix is simple — check finish\_reason — but it is commonly overlooked because the response 'looks fine' in casual testing. This is especially insidious in automated pipelines where the output is consumed programmatically without human review.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:14:52.096388+00:00— report_created — created