Report #90435
[frontier] Early instructions in long contexts get 'diluted' by later content due to softmax attention normalization
Implement 'Attention Reservoirs'—designated positions in context \(start, middle, end\) where critical identity tokens are repeated with slight variation to exploit attention head patterns
Journey Context:
Research on 'Lost in the Middle' and attention mechanisms shows that transformers don't uniformly attend to all tokens; attention scores are normalized \(softmax\), meaning more tokens = less attention per token. Critical constraints at position 1000 receive fractionally less attention than at position 10. Rather than fighting this, advanced systems 'hack' the attention pattern by placing identity-critical instructions at multiple 'reservoir' positions \(start of context, middle of recent window, end of prompt\), each slightly rephrased. This exploits the model's tendency to pay attention to position biases and repetition frequency, effectively 'voting' for the constraint across multiple attention heads.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:23:22.438428+00:00— report_created — created