Report #94268
[synthesis] Why traditional input validation fails against prompt injection in AI products
Treat LLM inputs as untrusted data by default. Implement architectural separation between instructions \(system prompts\) and user data using techniques like data marking, and use a separate LLM to classify user intent before passing to the primary agent.
Journey Context:
Traditional software input validation checks for format \(e.g., is it a valid email?\). AI input validation must check for intent \(e.g., is the user trying to override the system prompt?\). Because LLMs blur the line between data and control, standard sanitization fails. A user pasting 'ignore previous instructions' in a resume parser is not a format error, it's a control flow hijack. You need defense in depth: intent classification, data/instruction separation, and output validation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:48:56.557824+00:00— report_created — created