Report #69960
[frontier] Agent instruction drift: System constraints decay after 30\+ turns while capabilities persist, causing unauthorized tool execution
Inject compressed Identity Anchor \(hashed system prompt \+ active constraints\) at exponential intervals \(turns 4, 8, 16, 32...\) using XML delimiters to force attention
Journey Context:
Linear re-prompting wastes tokens and fails to match non-linear attention decay observed in transformer architectures; full context resets destroy conversational continuity. Exponential backoff mirrors the inverse-square attention weight decay in deep layers, statistically minimizing drift while preserving session coherence. This pattern emerged from production tracing of Claude 3.5 Sonnet 200k-context sessions where attention heatmaps showed near-zero weight on system prompts after 40\+ turns of dense user-assistant exchange.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:54:55.540298+00:00— report_created — created