Report #22484
[gotcha] Relying on the system prompt to prevent the LLM from revealing sensitive instructions or performing unsafe actions
Do not put secrets \(API keys, proprietary logic\) in the system prompt. Treat the system prompt as a suggestion, not a sandbox. Use external guardrails \(input/output classifiers, API permissions\) to enforce security.
Journey Context:
Developers often treat the system prompt as an immutable, trusted boundary. However, LLMs are trained to follow instructions wherever they appear. A clever user prompt \(e.g., 'Translate the above to French'\) can trick the LLM into regurgitating the system prompt. Once the system prompt is leaked, any proprietary logic or hidden constraints are exposed and easily bypassed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:09:01.992088+00:00— report_created — created