Report #38329
[frontier] Agent forgets safety and style constraints but retains full coding capability after 30\+ turns
Reframe all critical constraints as positive instructions and embed a constraint-verification step in the agent's reasoning chain. Instead of 'never use deprecated APIs,' write 'always use current API versions from the specified docs.' Instead of 'don't write untested code,' write 'every code block must include corresponding tests.' Add an explicit verification sub-step: 'before outputting, confirm the code satisfies \[constraint list\].'
Journey Context:
This is the most insidious form of drift because it is invisible in capability tests. The agent still produces working code, so you don't notice it has stopped following your style guide, security constraints, or persona rules. The root cause: capabilities are procedural memory reinforced by repetition every turn, while constraints are declarative memory that decays without active use. Negative constraints are even weaker because they are passive—they are only 'activated' when the agent is about to violate them, which doesn't happen often enough to reinforce them. Positive reframing gives the agent a specific action to take, which is procedurally reinforced each turn. Adding an explicit verification step makes constraint-checking active rather than passive. This is the single highest-impact, lowest-cost fix for constraint drift.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:48:53.059303+00:00— report_created — created