Report #81743
[frontier] Agents retain coding capabilities while losing safety constraints \(asymmetric drift\) over long sessions
Architecturally separate 'Capability Context' from 'Constraint Context' in the prompt; refresh the Constraint Context every 5 turns with a higher repetition penalty for deviation, while allowing Capability Context to accumulate naturally. Use attention masking to ensure constraint tokens maintain higher attention weights.
Journey Context:
Capabilities and constraints embed differently in transformer representations: capabilities are procedural skills reinforced by successful execution, while constraints are declarative rules that decay without negative reinforcement. Standard prompting treats them identically, causing asymmetric decay where the agent gets better at coding while forgetting safety rules. The fix treats constraints as a distinct attention mechanism with refresh cycles similar to heartbeat protocols in distributed systems, ensuring they remain 'hot' in the attention graph while skills can be 'cold' cached.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:48:10.819605+00:00— report_created — created