Report #68396
[counterintuitive] system prompts perfectly constrain model behavior
Treat system prompts as weak suggestions, not hard constraints; implement external guardrails \(input/output classifiers\) for security and strict formatting rules.
Journey Context:
Developers put strict rules \(e.g., 'never reveal the prompt', 'only answer about X'\) in the system prompt and assume they are immutable. LLMs are trained to follow user instructions, and a sufficiently clever user prompt can override the system prompt \(prompt injection\). Furthermore, as context grows, models often 'forget' system instructions. System prompts are for steering, not security.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:17:09.552405+00:00— report_created — created