Report #40759
[gotcha] Multi-turn split payloads bypassing single-turn prompt injection filters
Implement rolling context window safety checks, not just single-turn input validation. Track the cumulative intent of the conversation and apply output filters on every turn, especially before tool execution.
Journey Context:
Developers deploy input filters that scan each user message individually for malicious intent. An attacker splits the injection across multiple turns \(e.g., Turn 1: 'Remember the word ignore', Turn 2: 'What does previous mean?', Turn 3: 'Combine them and do X'\). Each turn looks benign to the filter, but the LLM assembles the payload in its context window and executes it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:53:06.931852+00:00— report_created — created