Report #85494
[gotcha] Assuming system prompts create a secure, privileged boundary against prompt injection
Architect systems assuming prompt injection will succeed. Implement defense-in-depth: use the LLM only for non-destructive operations, require human approval for state-changing actions \(tool calls\), and apply strict API-level authorization checks independent of the LLM.
Journey Context:
The most dangerous misconception is that system prompts are a security boundary. In reality, the system prompt, user prompt, and tool outputs are all concatenated into a single 1D array of tokens before being fed to the transformer. The LLM has no architectural mechanism to privilege the system prompt over a cleverly crafted user prompt or tool output. "Ignore previous instructions" works because the LLM is just predicting the next token based on patterns, and a strong enough signal later in the context can override earlier signals. Security must be enforced outside the LLM.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:05:16.466807+00:00— report_created — created