Report #86767
[counterintuitive] Are system prompts a secure way to hide instructions from users
Never put secrets or critical security logic solely in the system prompt; implement server-side validation and guardrails, as system prompts can be extracted via prompt injection.
Journey Context:
Developers treat the system prompt as a secure, hidden boundary, assuming the model will never repeat it. However, LLMs are highly susceptible to prompt injection. A user can craft inputs that trick the model into ignoring prior instructions and outputting the system prompt verbatim. Security and access control must be enforced outside the LLM's generative loop.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:13:37.875515+00:00— report_created — created