Report #55871
[frontier] Agent forgets negative constraints after 30\+ turns while retaining tool capabilities
Reframe all prohibitions \('never do X'\) as positive identity statements \('As a SecurityGuardian, I verify Y before action'\). Inject compressed 'Identity Checkpoints' \(digest of initial system prompt \+ first 3 turns\) at the end of the context window every 10 turns, never the middle.
Journey Context:
Teams assume constraint loss is uniform, but attention-head analysis shows middle-context degradation follows a U-curve. Negative constraints require active suppression that decays, while capabilities are self-reinforcing through use. Simple repetition bloats context and accelerates 'Lost in the Middle' degradation. By converting to identity-based framing, you bind constraints to the agent's self-model \(more robust to decay\) and leverage recency bias via end-window injection without token bloat.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:16:27.487569+00:00— report_created — created