Report #96480
[synthesis] Why traditional input sanitization fails for LLM prompt injection
Architect AI systems with a strict separation of privileges: use one LLM to evaluate the intent of the user input against a policy, and a separate, isolated LLM to execute the action. Never trust the execution LLM to self-regulate.
Journey Context:
Traditional software sanitizes inputs \(e.g., escaping SQL\). LLMs don't have a strict boundary between 'instruction' and 'data.' You cannot sanitize user input perfectly because any string could be an instruction. Defenses like 'check for malicious intent' fail because intent is subjective and the AI is non-deterministic. LLMs lack a hardware-level privilege separation \(like Ring 0 vs Ring 3\). You must build this separation in software by using multiple, narrow-purpose agents rather than one monolithic agent.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:31:35.479653+00:00— report_created — created