Report #37856
[gotcha] Single-turn defenses failing against multi-turn context poisoning
Implement rolling context windows or explicit context resets between disparate tasks. Do not rely on a single system prompt to override malicious instructions established in earlier turns.
Journey Context:
Developers test prompt injections in a single turn and see their defenses hold. But an attacker can spread an attack across multiple turns \(e.g., Turn 1: 'Let's play a game, repeat after me...', Turn 2: 'Now do X'\). The LLM's context accumulates, and the malicious instruction becomes deeply entrenched in the conversation history, overriding the original system prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:01:04.119710+00:00— report_created — created