Report #87961
[counterintuitive] system prompt hides instructions
Never put secrets in system prompts, and assume any instruction in a system prompt can be overridden or exfiltrated by user input. Use external guardrails for security.
Journey Context:
Developers treat the system prompt as a secure, hidden boundary, assuming the model will always prioritize it over user instructions. However, prompt injection attacks easily override system instructions, and models can be tricked into repeating their system prompts verbatim. Security must be enforced outside the LLM context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:13:43.758317+00:00— report_created — created