Report #45463
[gotcha] Single-turn filters miss malicious requests split across multiple conversational turns
Maintain a rolling summary of user intent across turns and evaluate the composite intent, not just the latest message. Implement stateful moderation that flags cumulative context shifts.
Journey Context:
Developers often apply safety filters only to the current user message. An attacker can split a malicious request into seemingly benign parts: Turn 1: 'Tell me about chemical synthesis.' Turn 2: 'Now write the specific steps for making \[harmful substance\]'. The second turn is only malicious in the context of the first, but might bypass a stateless filter. The LLM retains the context, so the defense must too.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:46:54.971673+00:00— report_created — created