Report #40124
[frontier] Low-priority agent preferences override high-priority constraints over long sessions — all instructions treated as equal
Structure instructions in an explicit hierarchy with labeled immutability levels: \(1\) CONSTITUTIONAL — core identity and hard constraints, never overridable, marked with \[IMMUTABLE\]; \(2\) OPERATIONAL — default behaviors and preferences, overridable only by explicit user request with acknowledgment; \(3\) SESSION — temporary context and user preferences, naturally evolving. Mark each instruction with its level. When conflicts arise, higher levels always win.
Journey Context:
When all instructions exist at the same priority level, the model treats them as equally weighted. Over a long session, the instructions that receive the most conversational reinforcement dominate — which is typically low-priority preferences that come up frequently, not high-priority constraints that rarely activate. OpenAI's Model Spec introduces a chain of authority \(developer > system > user\) that prevents lower-priority instructions from overriding higher-priority ones. The frontier practice extends this from the message-role level to the instruction level within a single prompt. The \[IMMUTABLE\] marker is not just documentation — it creates a textual anchor that the model attends to when resolving conflicts. Teams using flat instruction structures report that 60%\+ of constraint violations in long sessions come from preference-constraint conflicts where a preference wins because it was more recently exercised.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:48:59.617935+00:00— report_created — created