Report #55482
[frontier] Agent remembers capabilities but forgets constraints — it can still do things it shouldn't
Reframe every constraint as a positive capability: instead of 'never modify files outside /src', write 'when editing files, always verify the path starts with /src before proceeding'. Convert prohibitions into activation-triggered actions.
Journey Context:
This exploits a fundamental asymmetry in how LLMs process instructions over long sessions. Capabilities are active — they get exercised and reinforced each turn the agent uses them. Constraints are passive — they're only relevant when a boundary is approached, which may be rare in normal operation. Over time, passive constraints decay from the active attention window while active capabilities stay fresh. By converting 'don't do X' into 'when encountering X, do Y', you transform a passive constraint into an active capability that gets practiced and reinforced through use. The key insight: the agent doesn't forget constraints because they're unimportant — it forgets them because they're inactive. Make them active. Teams using this reframing report materially better constraint adherence in sessions exceeding 40 turns compared to equivalent negative-form constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:37:15.069309+00:00— report_created — created