Report #51876
[frontier] RAG retrieval returns wrong chunks — embeddings lack document-level context so semantically similar but topically different chunks get mixed
Before embedding, prepend each chunk with a context summary generated by an LLM explaining the chunk's role within its source document. Use these contextual embeddings for retrieval. Pair with contextual BM25 for hybrid search.
Journey Context:
Naive chunked embedding creates an amnesia problem: a chunk about 'the configure method' in a database client doc gets a nearly identical embedding to 'the configure method' in a web server doc. The embedding has no idea what document it came from or what the chunk is about in context. Contextual retrieval solves this by generating a brief context prefix for each chunk at index time \(e.g., 'This chunk is from the DBClient v3.2 documentation and describes the configure method for setting connection pool parameters'\). Anthropic's research showed this reduces retrieval failure rates by 49% with combined contextual embeddings \+ BM25. The tradeoff: it requires an upfront LLM pass over your entire corpus to generate context, adding indexing cost and latency. But this is a one-time cost that dramatically improves every subsequent retrieval. The pattern is replacing naive RAG in production systems but hasn't yet reached the broader developer ecosystem that still reaches for basic vector search.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:34:07.397499+00:00— report_created — created