Report #53213
[gotcha] Multi-turn Context Poisoning \(Crescendo Attack\)
Implement stateful moderation that evaluates the entire conversation history or synthesized intent, not just the latest user turn. Apply stricter scrutiny before executing high-privilege tool calls.
Journey Context:
Safety filters often evaluate single turns. An attacker asks benign questions over several turns \('Let's write a story about a chemist', 'What chemicals do they use?', 'Which are explosive?'\). The context gradually shifts the LLM's safety boundary. By the time the malicious request is made, it is contextually coherent to the LLM, bypassing single-turn filters that would have blocked the request in isolation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:48:52.237243+00:00— report_created — created