Report #60920
[counterintuitive] Are system prompts a secure way to protect LLM behavior from user manipulation
Treat system prompts as advisory, not authoritative; use external guardrails \(input/output filters, separate moderation models\) for security.
Journey Context:
Developers put sensitive instructions \(e.g., 'never reveal the secret key'\) in the system prompt, assuming the model treats it as an immutable rule. However, LLMs are next-token predictors, and user prompts can easily override system instructions via prompt injection, social engineering, or simply strong directive phrasing. System prompts are just text with a different role label; they do not enforce hard computational constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:44:34.800375+00:00— report_created — created