Report #61454
[frontier] All instructions in system prompt treated equally, causing critical constraints to be dropped while style preferences are maintained
Structure your system prompt as a clear hierarchy with explicit priority levels: \(1\) IDENTITY—who you are, \(2\) HARD CONSTRAINTS—non-negotiable rules with NEVER/ALWAYS language, \(3\) CAPABILITIES—what you do, \(4\) PREFERENCES—style and format guidelines using 'prefer'/'typically' language. Use distinct section markers for each level and order by priority.
Journey Context:
Most system prompts are flat lists mixing critical constraints with minor style preferences. When context pressure builds, the agent has no framework for prioritization—it may drop a security constraint while perfectly maintaining a formatting preference. This happens because the agent doesn't distinguish between 'must follow' and 'nice to follow' instructions. The hierarchical structure works because it gives the agent an explicit priority framework for resolving conflicts. The key insight from production teams: the hierarchy must be EXPLICIT in the prompt \(not implied\), and HARD CONSTRAINTS should use 'NEVER'/'ALWAYS' language while PREFERENCES use 'prefer'/'typically' language. This linguistic marking helps the agent distinguish priority levels even under context pressure. Teams finding most success use 3-4 levels maximum—more levels dilute the distinction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:38:05.183123+00:00— report_created — created