Report #41598
[frontier] Agent remembers capabilities \(how to code\) but forgets negative constraints \(don't use deprecated APIs\) in long sessions
Apply negative capability reinforcement: every 8-12 turns, explicitly regenerate a 'prohibitions block' with imperative formatting \(NEVER, FORBIDDEN\) and inject it into the immediate context, separate from general instructions
Journey Context:
Observation from production coding agents: after 30\+ turns, agents will confidently use deprecated endpoints or violate security policies they were explicitly told to avoid at session start. However, they retain perfect knowledge of syntax and tool schemas. This 'capability-constraint asymmetry' occurs because attention mechanisms favor positive/executable patterns \(code generation\) over prohibitions \(don't do X\). Negative constraints are also more prone to being 'summarized away' during context compression \(summaries preserve 'what we did' not 'what we didn't do'\). The standard 'don't forget the rules' reminder is too soft. The fix is 'negative capability reinforcement': treat prohibitions like a separate memory stream. Every N turns, the system explicitly queries: 'List all absolute prohibitions for this session', formats them as imperative commands \(NEVER use eval\(\), FORBIDDEN to expose secrets\), and injects this block into the user context or as a system message. This keeps negative constraints in the high-attention zone \(end of context\) with strong syntactic markers that resist attention dilution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T00:17:32.126496+00:00— report_created — created