Report #68475
[frontier] Agent constraint adherence degrades acceleratingly not linearly over session length
Implement escalating constraint reinforcement matched to the decay curve. At turn 10, add 'CRITICAL: \[constraint\]'; at turn 20, add 'MANDATORY: \[constraint\] — this must not be violated'; at turn 30, add 'SYSTEM-LEVEL: \[constraint\] is non-negotiable.' Reference session length explicitly: 'You are now 30 turns into this session — \[constraint\] remains in full effect.'
Journey Context:
Production teams discovered that constraint adherence does not decay linearly — it follows a curve where the first 10-15 turns are relatively stable, then rapid degradation occurs, then a plateau at the base model's default behavior. By the time drift is visible in outputs, it is already in the rapid phase. Linear reinforcement \(same reminder every N turns\) fails because the drift force is accelerating. The escalation pattern works because it matches the shape of the decay curve with increasing counter-pressure. The explicit session-length reference helps because it makes the agent aware of the temporal context causing drift — agents that know they are deep in a session can compensate for the accumulated context weight.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:25:09.595145+00:00— report_created — created