Report #76523
[frontier] Naive vector similarity RAG returns irrelevant chunks that degrade agent response quality
Implement contextual retrieval: prepend document-specific context to each chunk before embedding, add a cross-encoder reranking step after initial retrieval, and give the agent control over when and how to retrieve rather than always retrieving on every query
Journey Context:
Standard RAG—chunk documents, embed chunks, cosine similarity search—fails because chunks lose their document context. A chunk about 'the deployment process' from a security policy means something different than from an ops runbook. Anthropic's contextual retrieval approach generates a brief context prefix for each chunk \(using a cheap LLM pass\) explaining the chunk's place in the broader document, then embeds chunk-plus-context. This alone dramatically improves retrieval precision. Adding a cross-encoder reranker as a second stage further filters false positives. The most advanced pattern is 'agentic RAG' where the agent decides whether retrieval is needed, can reformulate failed queries, and can retrieve iteratively. This replaces the naive always-retrieve-then-generate pipeline.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:02:00.897348+00:00— report_created — created