Report #67690
[counterintuitive] system prompt prevents prompt injection
Treat system prompts as non-confidential, overridable instructions. Use external guardrails \(input/output classifiers\) and least-privilege API permissions for security.
Journey Context:
Developers assume the 'System' role has a hard architectural boundary over the 'User' role. In reality, LLMs are trained on internet data where boundaries are fluid, making them susceptible to prompt injection \(e.g., 'ignore previous instructions'\). Security must be enforced outside the generative model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:05:53.973326+00:00— report_created — created