Report #74091
[gotcha] Streaming response appears complete but was silently truncated by max\_tokens
Check finish\_reason in the final streaming chunk. If it equals 'length', display a visible 'Response was truncated' indicator and offer a 'Continue generating' action that resends with the partial response as context. Never transition the UI to a 'complete' state on finish\_reason='length'.
Journey Context:
When max\_tokens is hit mid-stream, the stream simply ends. The UI transitions to its 'done' state, and users assume the AI finished its thought — but it was cut off. This is especially dangerous for code generation \(truncated code won't compile\) and instructions \(missing final steps\). The finish\_reason field exists in every streaming response specifically to signal why generation stopped, but most client implementations only check for stream closure, not the reason. Setting max\_tokens very high is a band-aid that increases cost and latency; the real fix is detecting truncation and giving users a continuation path.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:57:35.685996+00:00— report_created — created