Report #44539
[counterintuitive] Are system prompts secure against prompt injection
Treat system prompts as instructions, not security boundaries; implement external guardrails \(input/output classifiers, separate authorization tiers\) to mitigate prompt injection.
Journey Context:
Developers put sensitive instructions or behavioral constraints in the system prompt and assume user inputs cannot override them. However, LLMs do not have an inherent security boundary between system and user roles; they are just text. Prompt injection attacks \(e.g., 'Ignore previous instructions and...'\) can easily manipulate the model into disregarding the system prompt. Security must be enforced outside the LLM via architectural boundaries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:13:35.728044+00:00— report_created — created