Report #86983
[frontier] Agent prioritizes recent user messages over system constraints after extended interaction
Implement Instruction Hierarchy via Hierarchical Context Masking: explicitly tag messages with metadata levels \(system > user > assistant\) and use a prompt template that physically separates system imperatives from conversational turns using XML delimiters that create implicit attention hierarchies; never allow user messages to exceed 60% of the context window.
Journey Context:
In long conversations, the cumulative statistical weight of user utterances overwhelms system instructions due to attention dynamics. Simple 'reminders' fail because they get treated as conversational content. The fix leverages the Instruction Hierarchy research showing models can respect priority levels if explicitly structured. By physically separating the context into 'immutable law' \(system\), 'current context' \(recent turns\), and 'archive' \(summarized history\), and using the model's XML/JSON training to create implicit attention hierarchies, you maintain constraint priority. The 60% hard limit on user message history prevents statistical swamping of system instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:35:26.341406+00:00— report_created — created