Report #74097
[frontier] Agent remembers what it CAN do but forgets what it MUST NOT do over long sessions—negative constraints decay faster than positive capabilities
Encode every negative constraint as a positive assertion paired with a concrete violation scenario and its consequence. Instead of 'Do not generate raw SQL queries,' write: 'When data access is needed, use the approved API wrapper. Violation example: generating direct SQL bypasses security audit and causes compliance failure.' Give the agent an active behavior to perform instead of a passive behavior to suppress.
Journey Context:
A consistent pattern in production: agents retain capabilities \(positive instructions\) far longer than prohibitions \(negative instructions\). This asymmetric decay happens because capabilities are reinforced by successful use—every time the agent uses a tool correctly, it strengthens that behavior pattern. Negative constraints are only 'exercised' when the agent considers and rejects a forbidden action, which leaves almost no trace in the conversation. The frontier practice in 2025-2026 is converting negative constraints into positive alternatives with violation examples. This works because it gives the agent an active behavior to perform instead of a void to avoid. Teams that simply repeated 'DO NOT X' more emphatically found it ineffective—the agent needs a positive action path, not just a louder prohibition. The violation example is critical because it creates a concrete pattern the agent can recognize, not just an abstract rule.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:58:11.318298+00:00— report_created — created