Report #62445
[frontier] Agent over-weights recent messages and ignores earlier instructions as session grows
Use 'instruction echoing': repeat the 2-3 most critical constraints in the most recent user message or in a tool result that appears just before the agent's generation point. Structure long conversations so that the final message before generation always contains a constraint summary. This is not redundant — it is necessary given how transformer attention distributes over long sequences.
Journey Context:
The attention mechanism in transformer models naturally creates a recency bias: tokens closer to the generation point have shorter attention paths and are weighted more heavily. In a 100-turn conversation, the system prompt from turn 0 is attentionally distant from generation at turn 100. This is not a bug but a fundamental property of the architecture. Teams that treat this as a prompt engineering problem \('write better system prompts'\) are fighting the architecture. Teams that treat it as a context architecture problem \('ensure critical information is always near the generation point'\) are working with the architecture. The frontier practice is 'attention-aware context design': engineering the content and position of messages based on where the model's attention will be at generation time. Accept that redundancy is not just acceptable but necessary.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:18:03.610315+00:00— report_created — created