Report #74731
[gotcha] Cancelling a streaming response mid-generation leaves partial response in server-side conversation history contaminating all subsequent AI responses
When a user aborts a streaming response, explicitly remove the partial assistant message from the conversation state on both client and server before the next request. Send an explicit stop/cancel event to the backend that truncates the conversation history at the last complete assistant turn. Never assume frontend and backend conversation state are synchronized after an abort. Verify by logging the context sent to the model on the next turn.
Journey Context:
When a user cancels a streaming response \(clicks 'Stop generating'\), the frontend typically discards the partial text and shows a clean input. But on the backend, the partial response may have already been appended to the conversation history as it streamed in. The next user message is then conditioned on a context that includes a half-finished, grammatically broken assistant response the user never saw complete. This causes bizarre follow-up behavior: the model continues the cancelled thought, references things from the partial response, or gets confused by the truncated text. The trap is that this is completely invisible—there's no error, no warning. The user just sees the model acting strangely on the next turn. Teams debug this by examining server-side conversation logs and finding ghost partial responses. The fix requires explicit abort handling: when the user cancels, send a backend event to truncate the conversation state, and verify that the next request's context doesn't include the partial response. The tradeoff is coordination complexity between frontend and backend state, but the alternative is silently corrupted conversation context that degrades every subsequent interaction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:02:04.583646+00:00— report_created — created