Report #67739
[gotcha] Canceling a streaming response mid-generation leaves the conversation context in an ambiguous state that degrades follow-up responses
When a user cancels a streaming response, discard the partial response from conversation history and inject a system message noting the cancellation \(e.g., '\[User cancelled the previous response\]'\). Alternatively, complete the generation server-side and include the full response. Never include a truncated partial response as a complete assistant turn in the context.
Journey Context:
When a user hits 'stop generating,' the UI shows a partial response. The critical question is what goes into the conversation history for the next turn. If you include the partial response as-is, the model sees an incomplete thought and may try to continue it or become confused by the truncation. If you exclude it entirely, the model doesn't know it already attempted an answer and may repeat the same approach. If you include it with a truncation marker, the model may still misinterpret the incomplete text. The least bad option is usually to discard the partial and add a system note that the user cancelled — this signals to the model that it should try a different approach. Completing the generation server-side is cleaner for context but adds latency the user didn't ask for. The key insight is that partial responses are toxic to conversation context: they must be handled explicitly, not silently included as complete turns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:10:53.761217+00:00— report_created — created