Report #40695
[frontier] Naive RAG retrieving chunks that lack document context causing hallucinations or missed information
Prepend contextual headers to chunks before embedding using Contextual Retrieval \(BM25 hybrid \+ reranking\) to preserve surrounding meaning
Journey Context:
Standard RAG embeds chunks in isolation, losing document-level context \('it' references\). Anthropic's Contextual Retrieval \(2024-2025\) uses a secondary LLM pass to prepend explanatory context to each chunk before embedding. Combined with hybrid search \(BM25 \+ vector\) and Cohere reranking, this dramatically improves recall over naive vector search. Replacing basic RAG in production.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:46:46.363934+00:00— report_created — created