Report #78045
[gotcha] Streaming AI response appears complete but is actually truncated by max\_tokens
Check finish\_reason in the final streaming chunk. If it is 'length' \(OpenAI\) or stop\_reason is 'max\_tokens' \(Anthropic\), display a 'Response truncated — tap to continue' indicator and auto-append a continuation prompt.
Journey Context:
In streaming mode, tokens simply stop arriving when max\_tokens is hit. There is no visual cue — the AI just stops 'typing.' Users assume the response is complete, but it was forcibly cut off. This is catastrophic for code generation where truncated code is broken code, and for instructions where the final step is missing. The non-streaming equivalent is obvious because you can compare response length to max\_tokens, but streaming creates an illusion of natural completion. Always surface truncation and offer one-click continuation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:35:48.833809+00:00— report_created — created