Report #80202
[frontier] Agent forgets negative constraints \('never do X'\) but retains positive capabilities \('how to do Y'\) after 30\+ turns
Convert all negative constraints to positive affirmations \('always verify before action'\) and re-inject them at exponentially increasing intervals \(turns 5, 15, 35\) using explicit role markers \('\#\#\# Security Constraint'\).
Journey Context:
Attention mechanisms treat negation as a low-salience modifier that decays faster than procedural schemas under KV cache pressure. Negative constraints rely on high-level semantic understanding that gets compressed first, while 'how-to' knowledge has structural anchors \(API schemas, JSON\). Common mistake: putting constraints only in the initial system prompt. Production teams in 2026 use 'Constraint Re-anchoring' that treats rules like ephemeral state requiring refresh, not static config.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:13:39.841396+00:00— report_created — created