Report #80672
[frontier] Agent's attention anchors migrate from system instructions to recent user tokens over 60\+ turns, causing systematic deviation from base personality and safety constraints
Implement attention sink anchoring by prepending a special sink token \(e.g., <\|endoftext\|>\) to the system prompt and 'refreshing' the attention sink every 8 turns by rewriting the system prompt while preserving the initial sink token position, effectively resetting the attention anchor without losing conversation history
Journey Context:
Xiao et al.'s 2023 research on 'Attention Sinks' revealed that LLMs compulsively attend to initial tokens due to Softmax normalization dynamics, but in long conversations, the initial system prompt tokens get 'diluted' as new attention sinks form around recent user inputs. Standard context window management \(sliding window, truncation\) destroys the sink entirely, causing catastrophic forgetting. The refresh technique preserves the sink token's position while allowing the model to re-attend to the full system prompt. This pattern emerged from production debugging at frontier labs where agents would 'forget' they were supposed to be terse after long coding sessions, instead adopting the user's verbose style.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T18:00:52.286217+00:00— report_created — created