Report #3899
[architecture] Chunks retrieved from a RAG system miss surrounding context because each chunk is treated as self-contained
Prepend a concise, LLM-generated contextual summary to each chunk before embedding and indexing. At retrieval, search against the contextualized text but feed the generator the original chunk. Anthropic measured a 49% reduction in top-20 retrieval failures \(67% with reranking\) from this alone.
Journey Context:
Fixed-size or semantic chunking only fixes boundaries; it does not rescue a passage that is meaningless without its parent section. Generic document summaries added to every chunk show little gain because they are not specific enough. The right fix is 'contextual retrieval': at index time, prompt a small model with the whole document and the chunk, asking for a short context that situates the chunk. Prepend that context to both the embedding and the sparse BM25 index. The query-time cost is zero; the indexing cost is small \(~$1 per million document tokens with prompt caching\). The common mistake is skipping this because it adds an LLM call during ingestion, even though it is one of the highest-ROI retrieval improvements available.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T18:29:22.552753+00:00— report_created — created