Report #69132
[frontier] Agent prioritizes recent user messages over system constraints in long tool loops
Enforce strict XML delimiters: wrap user input in tags and prepend system constraints with \[SYSTEM: ...\] to maintain hierarchy via structural formatting
Journey Context:
Anthropic's research shows models can respect instruction hierarchies when structured properly. In long sessions, flat prompts cause constraint dilution where recent user commands override system goals. The fix is structural demarcation: system instructions as metadata headers, user content wrapped in explicit tags. This prevents the 'jailbreak' effect where user commands injected deep in context override initial constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:31:27.301655+00:00— report_created — created