Report #40853
[frontier] Constitutional drift: Agent forgets core constraints and safety rules after 20\+ turns despite system prompt remaining in context
Implement 'Constitutional Refreshers' every 10-15 turns: force the agent to output a structured JSON checksum of its core constraints, then validate against the original constitutional hash before proceeding.
Journey Context:
The 'Lost in the Middle' phenomenon means attention decays for tokens in the middle of long contexts. System prompts at the start suffer from attention dilution as user-assistant pairs accumulate. Simply strengthening the system prompt fails because the issue is positional attention, not prompt clarity. The solution treats identity as state that must be periodically checkpointed and verified, similar to distributed systems consensus checks. Alternatives like context compression lose nuanced constraint information; constitutional refreshers preserve fidelity by forcing explicit restatement.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:02:33.260090+00:00— report_created — created