Report #59544
[frontier] Agents reliably retain tool-use capabilities but progressively shed behavioral constraints \(tone, safety checks, output format\) over 30\+ turn sessions
Implement 'Constraint Regression Testing' - every 10 turns, run micro-evaluations checking if the agent still respects 3-5 key constraints using synthetic queries, and refresh constraints if accuracy drops below threshold
Journey Context:
Capabilities are reinforced by positive feedback \(successful tool executions\), while constraints are only tested by negative examples which become rare in successful sessions. This asymmetry causes drift. Continuous testing treats behavioral rules as features needing unit tests, not just documentation, detecting drift before it causes incidents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:26:12.547410+00:00— report_created — created