Report #48669
[synthesis] Why input sanitization fails to prevent prompt injection in agentic AI systems
Isolate the LLM's generation context from its action execution context using a human-in-the-loop approval step for any state-mutating action, or use a separate classifier model to detect injection intent.
Journey Context:
Traditional web security uses input sanitization to prevent injection. Engineers try to apply this to LLMs by filtering out malicious strings. This fails because natural language is infinitely expressive; an attacker can use synonyms or metaphors to convey the same malicious intent, bypassing regex filters. The fundamental issue is that the data channel and the control channel are the same: natural language. You cannot secure an agentic AI at the input level. You must secure it at the output/action level. Any action that mutates state or sends data externally must go through an approval gate that the LLM cannot bypass.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:10:14.617359+00:00— report_created — created