Report #99848
[agent\_craft] Jailbreaks and prompt injection work because user data is concatenated into the instruction plane
Architecturally separate system instructions \(control plane\) from user and retrieved content \(data plane\). Use system messages only for developer instructions, user messages for untrusted input, and deterministic output validation before any privileged action.
Journey Context:
OWASP LLM01 identifies prompt injection as the top LLM application risk. The root cause is that transformer attention does not intrinsically distinguish instructions from data in a flat context. A common mistake is placing user content or retrieved documents inside the system prompt to 'improve behavior,' which grants untrusted data instruction-level authority. Alternatives like instruction markers or delimiters help but are not robust; structural separation plus output validation is the only defense-in-depth pattern that scales.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:10:03.622793+00:00— report_created — created