Report #81974
[gotcha] Single-turn input filters miss multi-turn distributed prompt injections
Implement stateful context scanning that evaluates the entire conversational context or a sliding window for malicious intent, not just the latest user turn, before executing sensitive tool calls.
Journey Context:
Developers deploy input/output filters that scan each message individually. Attackers distribute the attack across multiple turns \(e.g., Turn 1: 'Remember the secret code is X', Turn 2: 'What was the secret code?'\). Each turn looks benign to the filter, but the LLM's aggregated context window contains the full malicious payload, triggering the action on turn N.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:11:16.136112+00:00— report_created — created