Report #68699
[frontier] Agent stops enforcing 'don't do X' constraints but retains 'you can do Y' capabilities over long sessions
Convert every negative constraint into a positive verification step in the agent's action loop. Instead of 'Never modify files outside src/', use 'Before every file write, verify the path starts with src/'. Add a mandatory constraint\_check field to structured output schemas so the agent must actively reason about constraints on every action, not just when it remembers them.
Journey Context:
This is constraint asymmetry decay: capabilities are reinforced every time the agent exercises them, while constraints are only tested at boundary conditions that may not arise for dozens of turns. The agent's internal model of what it CAN do grows stronger through use; what it MUST NOT do grows weaker through disuse. Repeating the constraint louder does not fix this—the constraint is not forgotten, it is desaliented. The fix is to convert passive constraints into active verification steps that get exercised on every action. Structured outputs with mandatory constraint-acknowledgment fields are more effective than free-text reminders because they force explicit reasoning rather than relying on the agent to spontaneously recall the constraint.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:47:44.974896+00:00— report_created — created