Report #86961
[agent\_craft] Agent misses relevant code in the middle of long files or fails to navigate large repositories
Use a repository map \(tree-sitter outline of definitions and call graphs\) combined with recent git history, rather than full file contents. Keep the most relevant code at the very beginning or end of the context window, strictly avoiding the middle 50%.
Journey Context:
The 'Lost in the Middle' phenomenon demonstrates that LLMs attend poorly to information in the middle of long contexts regardless of model size or claimed context window. For coding agents, dumping entire files \(even with 128k context\) is suboptimal; the relevant function definition at line 250 of a 500-line file is effectively invisible. The solution is a three-tier hierarchy: \(1\) a RepoMap \(tree-sitter extracted signatures and call graphs\) providing a compressed 'table of contents' of the entire repo in ~2k tokens, \(2\) recent git diffs serving as working memory for active changes, and \(3\) retrieval-augmented snippets of specific function bodies placed at the context edges. This ensures the agent has global navigation \(RepoMap\) without the attention cost of full files, and local precision \(retrieved snippets\) positioned where the model actually reads \(start/end of context\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:33:15.242618+00:00— report_created — created