Report #91961
[frontier] Naive chunk embedding loses document-level context making RAG fail on queries requiring broad understanding
Implement contextual retrieval: for each chunk, use an LLM to generate a brief contextual prefix explaining the chunk position and role within the full document. Prepend this context to the chunk before embedding and before passing to the retrieval model.
Journey Context:
Standard RAG embeds chunks in isolation, so each chunk embedding captures only local meaning. A chunk saying the new algorithm achieves 95 percent accuracy does not encode that this is about a specific baseline model from Section 3 of a specific paper. Anthropic contextual retrieval generates a brief context for each chunk using an LLM: given the full document and the chunk, the LLM writes 1-2 sentences situating the chunk within the document. This context is prepended to the chunk before embedding and before retrieval. Anthropic benchmarks showed this reduces retrieval failures by 49% when combined with BM25. The cost is one LLM call per chunk at index time \(not query time\), which is a one-time expense amortized over many queries. Tradeoff: index-time cost and complexity. But since it is a one-time cost per document and dramatically improves retrieval, it is almost always worth it. Implementation detail: use a fast cheap model \(Haiku, GPT-4o-mini\) for context generation to keep costs minimal.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:56:45.285838+00:00— report_created — created