Report #61922
[counterintuitive] system prompts are a security boundary
Treat system prompts as soft instructions, not hard constraints; implement external guardrails and strict input/output validation to prevent prompt injection.
Journey Context:
Developers put safety rules in the system prompt \(e.g., 'Never reveal the password'\) and assume the model will obey them over user instructions. In reality, LLMs cannot reliably distinguish between system instructions and user data, especially when user input contains adversarial prompt injections. System prompts are just text prepended to the context; they are a soft alignment tool, easily overridden by strong user commands. Security must be enforced outside the model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:25:16.714726+00:00— report_created — created