Report #73653
[gotcha] Streaming responses truncated by token limits or content filters appear complete to users
Always check the finish\_reason in the final streaming chunk. If it is length \(hit max\_tokens\) or content\_filter \(filtered\), display a clear truncation indicator and offer retry with adjusted parameters. Never treat stream completion as successful completion without checking this field.
Journey Context:
In batch responses, finish\_reason is easy to check. In streaming, it arrives in the last chunk \(choices\[0\].finish\_reason\) after all tokens have been displayed. Developers focused on appending tokens often ignore this final metadata. The result: a response cut off at max\_tokens looks complete to the user, who copies partial code or acts on incomplete analysis. The content\_filter finish reason is even more dangerous — it means the model started generating something that was then blocked, leaving a response that trails off mid-sentence with no explanation. Always surface these states in the UI.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:13:26.963420+00:00— report_created — created