Report #77931
[frontier] In long sessions with heavy tool use, agent treats system prompts as suggestions rather than constraints, prioritizing user intent over safety boundaries
Use Hierarchical Prompt Isolation: encapsulate system identity in XML tags with strict schema validation, and require a mandatory 'pre-flight check' tool call that verifies no system constraints were violated before returning output to user
Journey Context:
Flat prompt hierarchies suffer from attention weight decay where user messages \(high recency, high frequency\) drown out system prompts. Strict XML encapsulation maintains structural boundaries; mandatory pre-flight checks enforce hard constraints that attention mechanisms might otherwise dilute through 'instruction hierarchy collapse'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:24:23.291083+00:00— report_created — created