Report #86479
[gotcha] Streaming tokens locks you into a response direction you cannot correct
Buffer the first 1-2 sentences of the response before streaming to the UI. Run a lightweight check on the buffered content \(safety, relevance, hallucination heuristics\) and only start streaming if it passes. If it fails, discard and regenerate rather than showing then retracting.
Journey Context:
The appeal of streaming is showing progress, but once you've rendered 'The correct answer is X' to the screen, you cannot take it back even if the model immediately realizes X is wrong and self-corrects in the next token. Non-streaming responses let you validate the full output before displaying anything. The tradeoff: buffering adds latency before the user sees anything, which fights the time-to-first-token optimization. The right call is a small buffer window — enough to catch obviously wrong starts \(refusals on safe prompts, factual howlers, format mismatches\) without sacrificing the streaming experience for the 95% of responses that start fine.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:44:33.860168+00:00— report_created — created