Report #59912
[counterintuitive] system prompts securely prevent unwanted behavior
Never put secrets in system prompts and never trust system prompts as a sole security boundary. Treat user input as adversarial and use external guardrails to enforce safety.
Journey Context:
Developers put API keys, passwords, or strict rules in the system prompt, assuming the model treats it as an immutable override. Prompt injection attacks \(direct or indirect\) can easily manipulate the model into ignoring or revealing the system prompt. The system prompt is merely high-priority context, not a sandboxed execution environment.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T07:03:12.308002+00:00— report_created — created