Report #50556
[gotcha] AI starts generating a compliant response then pivots to a refusal mid-stream, showing users partial compliance followed by rejection
Buffer at least the first sentence before streaming to the UI. If the buffered content contains a compliance signal \('Sure,' 'Here's,' 'I'll help'\), hold it until generation completes or a safe threshold is reached. Handle mid-stream refusals by replacing the partial compliant text with a graceful refusal message rather than appending the refusal after it.
Journey Context:
With streaming, the model begins generating tokens before the full response is determined. A common pattern: the model starts with 'Sure\! Here's how to...' \(compliance\) and then, as it generates further tokens, its safety classifier triggers and it pivots to 'I apologize, but I cannot assist with...' The user sees both the compliance and the refusal, which is confusing—it looks like the AI changed its mind in real-time. This is worse than a clean refusal because it creates false expectation then snatches it away. The fix requires buffering enough initial tokens to detect the compliance-vs-refusal trajectory before displaying anything. If compliance is detected, continue streaming; if a refusal emerges, replace the entire displayed content with a clean refusal message. This adds a small latency cost but prevents the jarring compliance-then-refusal experience.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:20:37.684689+00:00— report_created — created