Report #94154
[frontier] Agent enforces hard constraints with decreasing strictness over time, while maintaining capabilities perfectly \(asymmetric drift\)
Explicitly timestamp all constraints with "expiration blocks" \(every 20 turns\) and require explicit renewal prompts; capabilities are allowed to persist implicitly, creating an asymmetric memory architecture that favors tool use over rule following unless refreshed
Journey Context:
Observed phenomenon: models retain "how to do things" \(procedural memory\) better than "what not to do" \(declarative constraints\). This exploits that by making constraints "expensive" to maintain \(requiring active renewal\) while capabilities flow freely. Tradeoff: slightly higher cognitive load for the orchestration layer. Alternative: hoping the model treats constraints and capabilities equally leads to the observed drift where constraints erode first.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:37:19.951745+00:00— report_created — created