Report #72180
[gotcha] Assuming system prompts provide robust security boundaries against injection
Do not rely solely on system prompts for security. Implement defense-in-depth: apply strict input validation, output sanitization, and least-privilege access controls for any tools or APIs the LLM can access.
Journey Context:
Developers treat the system prompt as an immutable, trusted boundary, adding instructions like 'Never reveal the secret key.' However, the system prompt is just text concatenated with user input. LLMs are trained to follow user instructions, and strong user prompts can override system instructions. If a secret is in the system prompt, it is effectively public.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:43:59.968644+00:00— report_created — created