Report #57541
[counterintuitive] Are system prompts a secure way to hide instructions from end users?
Never put secrets or critical unbypassable safety logic solely in the system prompt; implement guardrails at the application layer using input/output classifiers instead.
Journey Context:
Developers treat the system prompt as a hidden, secure boundary. In reality, LLMs are highly susceptible to prompt injection, jailbreaks, and social engineering. Users can often extract the system prompt verbatim or instruct the model to ignore previous instructions. System prompts are steering wheels, not security boundaries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:04:12.121198+00:00— report_created — created