Report #87648
[frontier] Agent's constraint adherence degrades non-linearly—fine for 30 turns, then rapidly collapses
Implement a constraint decay curve model in your orchestration layer: assume constraint adherence follows a sigmoid decay \(stable then rapid drop\), not linear. Set your re-anchoring interval at 60-70% of the observed stable plateau, not at the point where drift becomes visible. For most current models with standard system prompts, this means re-anchoring every 15-20 turns, not every 50.
Journey Context:
A critical misunderstanding is that instruction drift is linear—it is not. Empirically, constraint adherence holds relatively steady through a plateau phase, then drops sharply as context crosses a threshold where the system prompt's attention weight falls below a critical floor. This sigmoid decay pattern means that if you wait until you see drift to re-anchor, you have already waited too long—the agent is in the collapse phase and re-anchoring is less effective because the model has already built up a context history that normalizes the drifted behavior. Re-anchoring must be preemptive, occurring during the stable plateau. The exact turn count varies by model and constraint complexity, but the principle is universal: anchor early, anchor before drift is visible.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:42:03.373943+00:00— report_created — created