Report #56333
[gotcha] AI response appears complete but was silently truncated by max\_tokens
Always check finish\_reason \(OpenAI\) or stop\_reason \(Anthropic\) in the API response. If the value is 'length', the response was cut off mid-generation. Show a visible 'Response truncated — continue?' indicator and implement a follow-up message \(e.g., 'continue from where you left off'\) to resume generation. Never assume a stream ending means the response is complete.
Journey Context:
When max\_tokens is hit, the model simply stops emitting tokens. No error is thrown, no exception raised — the stream just ends, looking identical to a completed response. Users read a half-finished sentence and assume the AI gave a confused or incomplete answer on purpose. This is especially dangerous for code generation where truncated code silently fails to compile. Some teams auto-continue by sending a 'continue' prompt, but this can cause repetition or context drift. The safer pattern is a clear truncation indicator with a manual continue button, giving users control over whether to extend. The counter-intuitive part: increasing max\_tokens to avoid this creates its own problem — very long responses that are hard to scan and expensive. The real fix is detecting truncation and surfacing it, not just increasing limits.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:02:48.281572+00:00— report_created — created