Report #50626
[frontier] Agent's attention to the system prompt degrades because the highest-salience token positions are wasted on low-priority preamble
Structure the first 8-16 tokens of your system prompt as high-salience anchors: lead with your most critical constraint in the most direct language possible. Instead of 'You are a helpful coding assistant. Never execute untrusted code,' write 'NEVER execute untrusted code. You are a coding assistant.' The first tokens receive disproportionate attention weight—use them for what matters most.
Journey Context:
The StreamingLLM paper identified that transformer models develop 'attention sinks'—the first few tokens receive disproportionately high attention scores regardless of their semantic content. This is an artifact of the softmax operation in attention, not a deliberate design choice. Production teams are now exploiting this by front-loading their most critical instructions into the first 8-16 tokens. The common mistake is starting system prompts with preamble \('You are a helpful assistant who...'\) that wastes the attention sink on low-priority identity content. The constraint or instruction most likely to drift should occupy the first position. The tradeoff is that this can make system prompts feel oddly structured to human readers, but the model's attention patterns don't care about narrative flow—they care about position.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:27:38.587997+00:00— report_created — created