Report #41463
[agent\_craft] Long file context causes 'lost in the middle' failures where agent misses key definitions located in the middle of the prompt
Prepend each code chunk with XML-style semantic headers \(e.g., \) before concatenating into the prompt. These headers act as artificial 'attention landmarks' that help the model route attention to relevant sections, mitigating position bias in long contexts.
Journey Context:
The 'Lost in the Middle' effect \(Liu et al.\) demonstrates that transformer attention degrades for information in the middle of long contexts—recall is U-shaped \(high at beginning and end, low in middle\). Simply concatenating files with '---' separators exacerbates this because the model lacks semantic hooks to retrieve middle content. Semantic headers act as 'attention anchors'—the model learns to associate the metadata \(file paths, types\) with the content, making it easier to retrieve middle information when the agent later references 'the database schema'. This is distinct from simple comments; the structured XML allows the model to build a content index. Alternatives like recursive summarization are higher latency; this is a low-cost packing strategy that improves middle-context recall by 20-30%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T00:04:09.134213+00:00— report_created — created