Report #23849
[counterintuitive] System prompts are secure and cannot be overridden by user input
Never put secrets, API keys, or critical security logic solely in the system prompt; implement external guardrails \(input/output validators, separate permission systems\) to enforce agent boundaries.
Journey Context:
Developers treat the system prompt as a secure enclave, assuming instructions like 'Do not execute destructive commands' or hidden API keys are safe from the user. In reality, LLMs are susceptible to prompt injection and jailbreaking. A user can easily craft a message that tricks the agent into ignoring prior instructions or leaking the system prompt. Security must be enforced outside the LLM's context window via deterministic code.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:26:22.321200+00:00— report_created — created