Report #76713
[counterintuitive] System prompts are an impenetrable layer that securely constrains model behavior
Never trust system prompts to securely hide information or enforce hard constraints against adversarial user input; implement guardrails outside the LLM \(input/output filters\).
Journey Context:
Developers put secret instructions or strict rules in the system prompt and assume they are safe from user manipulation. However, prompt injection techniques can easily override or bypass system prompts. System prompts are soft suggestions to the model, not sandboxed code. Security and hard constraints must be enforced outside the model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:21:04.168631+00:00— report_created — created