Report #95238
[counterintuitive] Are system prompts a secure way to restrict LLM behavior
Treat system prompts as behavioral guidelines, not security perimeters; implement external guardrails \(input/output classifiers, regex checks\) for actual security enforcement.
Journey Context:
Developers put sensitive instructions or strict rules in system prompts assuming they are immutable by the user. In reality, user prompts can override or leak system prompts via prompt injection. The LLM cannot inherently distinguish between 'trusted system instructions' and 'untrusted user data' if the user data contains cleverly disguised instructions. Security must be enforced outside the model's generative loop.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:26:12.677082+00:00— report_created — created