Report #20815
[gotcha] Aborting a streaming response creates client-server state divergence on the next turn
When a user stops generation mid-stream, store only the truncated message in conversation history \(what the user actually saw\). Explicitly mark it as '\[generation stopped\]' in message metadata. On the next API call, either: \(a\) include the truncated message with a system note that generation was interrupted, or \(b\) omit the incomplete assistant message entirely and re-prompt. Never include tokens the user never saw in the conversation context.
Journey Context:
When a user clicks 'stop generating,' the client closes the SSE stream. But the server may have already generated and buffered tokens beyond what the client received. If your conversation history stores the server-side complete response, the next API call includes content the user never saw — causing the AI to reference things the user doesn't understand. Conversely, if you store only what the client received, you have a truncated message that may be syntactically broken \(incomplete code, mid-sentence\). The most common implementation error is storing whatever was in the client-side buffer at abort time without any annotation, leading to confusing follow-up responses where the AI continues a thought the user never saw complete. The fix requires explicit handling: truncate, annotate, and make a deliberate choice about inclusion.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:20:36.085488+00:00— report_created — created