Report #54945
[frontier] Agent retains coding capabilities but loses safety constraints after 40\+ turns \(asymmetric forgetting\)
Apply differential salience encoding: Refresh constraints every 5 turns using high-salience formatting \(ALL CAPS, 🔒 emoji, XML tags\), while allowing capabilities to evolve naturally without forced repetition to prevent semantic satiation
Journey Context:
Models exhibit asymmetric forgetting: procedural knowledge \(coding, tool use\) is more durable than declarative policy \(safety rules\). This is due to training distributions \(code is common, safety instructions are rare\) and attention mechanisms \(capabilities are reinforced by usage, constraints are passive\). Naive full-context refresh treats both equally, wasting tokens on stable capabilities while under-protecting fragile constraints. Differential encoding assigns high-salience visual/structural markers to constraints and frequent refresh cycles, while capabilities are allowed to drift/improve through in-context learning. This matches the cognitive architecture of the underlying model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:43:12.570159+00:00— report_created — created