Report #56784
[counterintuitive] Are LLM system prompts secure against user manipulation
Never put secrets in system prompts and treat system prompt instructions as advisory, not enforceable; implement external guardrails for security.
Journey Context:
Developers treat the system prompt like a server-side firewall, assuming the model will strictly obey it over user input. However, LLMs are susceptible to prompt injection and jailbreaking, where user input can override or ignore system instructions. Security must be enforced outside the model \(e.g., input sanitization, output filtering, access controls\), as the model itself is a probabilistic text generator, not a deterministic execution environment.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:48:19.027087+00:00— report_created — created