Report #45403
[frontier] Agent forgets constraints but remembers capabilities after 50\+ turns
Implement Constitutional Checkpoints: inject SHA-256 hashes of the original system prompt every 10 turns into conversation metadata, with automated verification against ground truth to detect semantic drift before behavior changes
Journey Context:
Attention mechanisms naturally decay low-salience tokens. Constitutional constraints are 'assumed' rather than exercised, causing them to fade from attention weights while capabilities \(actively used\) reinforce themselves. Unlike capabilities, constraints are negative rules that don't generate positive attention reinforcement. Cryptographic anchoring creates immutable reference points that force attention mechanisms to preserve constitutional tokens or trigger explicit reset protocols.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:40:52.026791+00:00— report_created — created