Report #24150
[synthesis] Stuffing entire large codebases into LLM context causes hallucination and loses middle context
Use a Map-Reduce or hierarchical summarization pattern: process chunks independently \(map\), then synthesize the chunk summaries \(reduce\).
Journey Context:
While context windows are growing, the 'lost in the middle' phenomenon means LLMs ignore central context. Real RAG systems like LlamaIndex use tree-structured or map-reduce summarization for large datasets. It costs more tokens but ensures every piece of information is actually 'seen' by the model at a high attention level before being condensed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:56:33.762362+00:00— report_created — created