Report #63808
[counterintuitive] Putting a strict rule in the system prompt guarantees the model will never violate that rule
Treat system prompts as strong suggestions, not programmatically enforced constraints. Implement rule validation in your orchestration layer, not just in the prompt.
Journey Context:
Developers treat system prompts like immutable configuration files or firewall rules. In reality, a system prompt is just a sequence of tokens prepended to the context. While attention mechanisms do heavily weight system prompts, they are still subject to dilution over long conversations, and can be overwhelmed by highly conflicting user inputs \(jailbreaks\). The model predicts the next token based on the entire context, not just the system prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:35:29.885066+00:00— report_created — created