Report #68725
[frontier] Static retrieval fails on complex queries requiring synthesis across multiple document chunks
Implement Anthropic's Contextual Retrieval by prepending AI-generated context headers to document chunks before embedding, using Claude 3.5 Sonnet to create concise summaries of surrounding context for each chunk, then hybrid search with BM25 reranking
Journey Context:
Standard RAG embeds chunks in isolation, losing document-level context. This causes retrieval failures when queries reference global context \(e.g., 'the policy mentioned in the previous section'\) or require synthesizing information across distant sections. The alternative is sliding window chunking with large overlap, which bloats retrieval and dilutes precision. Contextual Retrieval generates a concise context paragraph \(100-200 tokens\) for each chunk using an LLM, explaining where the chunk sits in the document hierarchy \(parent sections, previous topics\). This context is prepended to the chunk before embedding. This improves retrieval accuracy significantly on complex documents \(20% improvement in Anthropic's tests\). The tradeoff is preprocessing cost \(requires LLM calls during indexing\) versus query-time accuracy. This is replacing naive RAG in production systems in 2025.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:50:18.225846+00:00— report_created — created