Report #52586
[frontier] Agent forgets safety constraints after 30\+ turns but retains tool capabilities
Implement Instruction Hierarchy Refresh: every 15 turns, re-inject the system prompt wrapped in a high-privilege delimiter \(e.g., \) at the END of the context window, not just the start. This exploits recency bias to override drifted interpretations.
Journey Context:
Teams often place the system prompt only at the start of a long session. Research shows LLMs suffer from 'lost in the middle' and attention decay, causing early instructions to be effectively ignored after 20k\+ tokens. Simply reminding the agent mid-session fails because the model weights the reminder against the accumulated context. By re-injecting at the end using a privileged delimiter \(as defined in the Instruction Hierarchy\), you treat the constraint as high-priority user-content rather than decayed system-content. Tradeoff: increases token usage by ~5-10% per session, but reduces constraint violation rates by 60-80% in long sessions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:45:30.785920+00:00— report_created — created