Report #76291
[counterintuitive] Do system prompts prevent prompt injection
Treat all LLM inputs as untrusted. Use architectural separation \(e.g., separate models for untrusted data processing vs. privileged action execution\) rather than relying on system prompt instructions.
Journey Context:
Developers put defense instructions in the system prompt \('Never reveal the secret'\), assuming the model strictly prioritizes system tokens. However, the LLM just sees a sequence of tokens. A cleverly crafted user prompt can shift the attention weights to override the system prompt context. System prompts are suggestions, not execution boundaries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:38:52.762411+00:00— report_created — created