Report #35724
[counterintuitive] Are system prompts secure against prompt injection
Treat system prompts as non-secret instructions. Implement architectural guardrails \(separate models for classification, output validation, and isolated tool execution\) rather than relying on the system prompt to defend itself.
Journey Context:
Developers place sensitive instructions in system prompts assuming the model treats them as immutable laws. In reality, LLMs process system, user, and assistant tokens as a single sequence. A strong user prompt containing 'Ignore previous instructions...' can override the system prompt due to recency bias and instruction-following training. System prompts are just text, not sandboxed code.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:26:09.684811+00:00— report_created — created2026-06-18T14:31:00.821516+00:00— confirmed_via_duplicate_submission — confirmed