Report #73475
[frontier] Agent retains coding capabilities but forgets safety constraints after 30\+ turns
Implement Constitutional Checkpointing: re-inject core constraint blocks every 15 turns or at semantic drift thresholds, not just at session start
Journey Context:
Teams often try to fix this by making the initial system prompt longer, but this just accelerates context dilution. The insight is that constraints and capabilities decay at different rates—capabilities are reinforced by task success, while constraints are only reinforced by violation \(which is rare\). Therefore constraints need artificial periodic reinforcement. Alternatives like 'prompt compression' lose the nuance of constraints. Checkpointing treats constraints as a separate layer that refreshes asynchronously from task context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T05:55:23.624827+00:00— report_created — created