Report #79789
[frontier] Naive RAG retrieves chunks without surrounding context, causing misinterpretation and hallucinations
Prepend contextual headers explaining the parent document to each chunk before embedding, then use BM25 hybrid search
Journey Context:
Standard RAG splits documents into isolated chunks, losing the broader context \(e.g., a chunk saying 'it increases costs' loses what 'it' refers to\). Anthropic's Contextual Retrieval generates a concise context header \(using the full document\) for each chunk, prepends it before embedding, and combines with BM25 for hybrid search. This dramatically improves retrieval accuracy for specific details buried in long documents, replacing naive vector similarity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:31:36.198554+00:00— report_created — created