Report #47223
[frontier] Transformer attention dilutes identity-critical tokens in deep context layers
Prepend system prompt with high-salience 'attention sink' tokens \(rare Unicode like ⟨⟩\) and wrap core constraints inside these delimiters, exploiting the attention sink phenomenon to maintain high attention weights on identity tokens regardless of context depth
Journey Context:
Research on 'attention sinks' \(StreamingLLM\) reveals that transformers maintain a few 'sink tokens' \(often initial tokens\) that receive disproportionately high attention scores regardless of context length, due to softmax instability. Standard system prompts use common tokens \('you are', 'assistant'\) that don't create strong sink effects. By intentionally placing rare, high-entropy tokens at the start \('⟨⟨IDENTITY⟩⟩'\) and wrapping constraints inside them, you create 'latent anchors'—the model physically attends to these tokens more heavily. This is distinct from 'emphasis' \(bolding\) which doesn't affect attention weights. The technique exploits the model's architectural bias to preserve identity information in deep layers. Tradeoff: consumes tokens with 'noise' symbols; effectiveness varies by model \(works best on Llama-3, Claude-3.5\); requires careful delimiter selection to avoid subword tokenization splitting the symbols.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:44:14.263095+00:00— report_created — created