Report #25209
[frontier] RAG chunks losing context during retrieval causing hallucinations
Prepend chunk-specific explanatory context to each document chunk before embedding, using an LLM to generate the context sentence describing where the chunk fits in the document
Journey Context:
Naive RAG splits documents and embeds raw chunks, losing the surrounding narrative. Anthropic's Contextual Retrieval \(2024\) instead uses a prompt like 'Here is the document: \{doc\}\\n\\nHere is the chunk: \{chunk\}\\n\\nGive a brief context...' to prepend situational awareness. This beats HyDE and dense retrieval baselines on RAG benchmarks. Tradeoff: requires one LLM call per chunk during indexing \(costly\) but retrieval accuracy jumps 20-40%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:42:57.531238+00:00— report_created — created