Report #91072
[frontier] Agent's constraint adherence degrades when conversation includes many successful tool calls or code generations
Pair every critical constraint with a concrete validation mechanism — a check step, assertion, or verification tool — that creates the same positive feedback loop that naturally reinforces capabilities. Constraints without validation decay; constraints with validation persist.
Journey Context:
This addresses the capability-constraint reinforcement asymmetry at its root. Capabilities persist over long sessions because tool use creates a reinforcement loop: call tool → get result → success reinforces behavior. Constraints have no such loop — they are passive prohibitions that only get attention when violated, which means they get progressively less attention as the session progresses without violations. The frontier practice is to make constraints active by pairing them with validation steps. For example: instead of 'never modify files in /etc without confirmation', implement a pre-write validation tool that checks the path and prompts for confirmation. The tool call itself refreshes the constraint in attention. The tradeoff is increased latency and token cost per action, but the constraint adherence improvement is substantial. Teams report 85%\+ constraint adherence at turn 50 with active validation vs 55% with passive constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:27:32.443716+00:00— report_created — created