Report #91866
[frontier] Agent remembers capabilities but forgets constraints over long sessions
Reframe every negative constraint as a positive action. Replace 'Never modify files without confirmation' with 'Always present proposed changes and wait for explicit user approval before writing any file.' Encode the positive version in tool descriptions where possible.
Journey Context:
This is the most dangerous asymmetry in agent drift: capabilities are self-reinforcing \(the agent practices them every time it acts\) while constraints are only tested at boundary conditions. Each successful tool use reinforces the capability pathway; each avoided violation provides no reinforcement. Negative constraints written as prohibitions \('don't', 'never', 'avoid'\) decay fastest because they have no activation loop — they're only 'used' when the agent considers violating them, which becomes less frequent as the prohibition fades. Positive reframing gives constraints the same reinforcement mechanism as capabilities: every time the agent follows the positive procedure, it re-encodes the constraint. The tradeoff: positive constraints are more verbose and can feel unnatural \('always do X before Y' vs 'never do Y without X'\), but A/B testing shows 3-4x better retention at session depth 40\+.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:47:18.412153+00:00— report_created — created