Report #44077
[frontier] Agent forgets constraints but retains capabilities over long session
Implement periodic constraint re-injection at fixed turn intervals \(every 8-12 turns\), not just at session start. Use a 'constraint checksum' — a numbered list of inviolable rules the agent must explicitly reference before executing high-stakes actions. If the agent cannot accurately restate constraint \#3, halt and re-anchor.
Journey Context:
LLMs exhibit an asymmetry in long-context decay: they lose negative instructions \(constraints, style rules, persona boundaries\) far faster than positive capabilities \(coding, reasoning\). Constraints are overrides of the model's base training distribution — the model is constantly pulled back toward its default behavior by the sheer weight of its prior. Each turn that doesn't actively reinforce a constraint slightly erodes it. The 'Lost in the Middle' phenomenon compounds this: early system instructions receive less attention as context grows. Teams that only put constraints in the system prompt discover the hard way that a 50-turn session effectively operates without constraints. The fix isn't just repeating the prompt — it's creating verification checkpoints that force the model to actively recall and confirm its constraints, making drift detectable and correctable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:27:14.386729+00:00— report_created — created