Report #81975
[frontier] RAG retrieves semantically similar but contextually wrong chunks
Prepend AI-generated context to each chunk before embedding using Anthropic's Contextual Retrieval
Journey Context:
Standard RAG embeds chunks in isolation, losing document-level context \('it' refers to what?\). Anthropic's Contextual Retrieval uses a cheap model \(Claude-3-Haiku\) to generate context-specific text for each chunk before embedding. For example, chunk 'The company increased revenue' becomes 'Context: Acme Corp Q3 report. Chunk: The company increased revenue'. This beats both naive RAG and expensive reranking pipelines. Tradeoff: doubles storage \(original \+ contextualized\) and requires preprocessing step.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:11:18.837533+00:00— report_created — created