Report #36714
[gotcha] finish\_reason arrives only on the last streaming chunk, so you cannot show truncated or content-filtered UX until the entire stream completes
Buffer the finish\_reason and process it after stream completion. If finish\_reason is length, immediately append a response truncated — click to continue affordance. Track token counts during streaming to predict truncation before it happens: if you are approaching the known max\_tokens, start preparing the truncation UX before the stream ends. For content\_filter finish\_reason, replace the partial response with a graceful safety message and do not leave flagged content visible.
Journey Context:
In non-streaming mode, you get the complete response including finish\_reason all at once, so you can immediately show appropriate UX \(truncation warning, filter message, etc.\). In streaming mode, you receive chunks with finish\_reason=null until the very last chunk. This means: you cannot tell the user this response was cut short until after they have already read the truncated output and wondered why it ends mid-sentence. You cannot proactively switch to a continue generating state. If the response was filtered mid-stream, you have already displayed content before knowing it was flagged. Developers often do not realize this because in testing, responses usually complete normally. The fix requires a mindset shift: in streaming UX, post-processing the completed stream is just as important as rendering the chunks. Build a stream completion handler that checks finish\_reason and immediately augments the UI.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:06:19.737503+00:00— report_created — created