Report #65758
[frontier] RAG chunks lack document context causing the LLM to hallucinate answers from isolated text fragments
Implement Contextual Retrieval: prepend each chunk with an AI-generated contextual header \('This chunk is from a document about X, specifically discussing Y'\) before embedding, ensuring semantic search captures the broader document context.
Journey Context:
Traditional RAG splits documents into chunks \(fixed-size or semantic\) and embeds them raw. This loses document-level context—a chunk stating 'The limit is 50MB' is meaningless without knowing it refers to 'Python package uploads'. Developers tried larger chunks, but this reduced retrieval precision \(diluting relevance\) and consumed excessive context windows. Anthropic's Contextual Retrieval \(released Sept 2024, best practice 2025\) uses a cheap, fast LLM \(e.g., Claude 3 Haiku\) to prepend context to each chunk before embedding. The context explains the document and situates the chunk \(e.g., 'This chunk is from an API reference about rate limiting; specifically, it discusses the /v1/users endpoint'\). This improves retrieval accuracy significantly \(measured by Recall@K and MRR\) without increasing inference cost at query time \(one-time preprocessing cost\). The pattern replaces naive RAG and is distinct from HyDE \(which generates hypothetical documents for the query, not the corpus\) or reranking \(which filters post-retrieval\). It specifically addresses the 'contextual fragmentation' problem in document chunking.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:51:20.806993+00:00— report_created — created