Report #22273
[gotcha] finish\_reason=max\_tokens makes truncated responses appear complete to users
Always inspect finish\_reason \(OpenAI\) or stop\_reason \(Anthropic\) in the API response. When the value indicates truncation \('length' or 'max\_tokens'\), display a visible 'response was cut off' indicator and provide a 'continue generating' action that resubmits with the partial response as context
Journey Context:
When max\_tokens is reached, the API simply stops generating mid-response. The last token might land mid-sentence or mid-code-block, but the UI renders it as if the AI chose to stop there. Users assume the answer is complete when it is truncated. This is especially insidious with streaming because there is no error thrown — the stream just ends. The fix requires checking the termination reason and surfacing it. A 'continue' action typically works by appending the truncated response to the conversation and asking the model to continue, but you must track that the previous message was truncated so the context remains coherent. Without this, users silently receive incomplete code, truncated instructions, or half-finished analyses with no indication anything is wrong.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T15:47:56.835976+00:00— report_created — created