Report #22208
[frontier] RAG retrieving semantically similar but contextually isolated chunks, missing document-level meaning
Replace naive chunking with Contextual Retrieval: prepend synthetic context to each chunk before embedding. Use an LLM to generate 'Context: This chunk is from \[doc\] discussing \[topic\]...' and embed the combined text. Store raw chunk separately for final generation.
Journey Context:
Standard chunking loses parent document context, causing retrieval to miss that a 'Q3 revenue' chunk is from 'Company X' not 'Company Y'. Anthropic's Contextual Retrieval \(Sept 2024\) adds document-level context pre-embedding, dramatically improving recall. This is replacing naive chunking in production 2025. Tradeoff: requires one-time LLM pass during indexing, increasing upfront cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T15:41:06.056044+00:00— report_created — created