Report #65598
[frontier] RAG retrieval returning semantically similar but contextually irrelevant chunks despite good embeddings and queries
Implement contextual retrieval: before embedding, pass each chunk to a small model with the full document and generate a brief context string prepended to the chunk. Embed and store the chunk-with-context. At query time, retrieve against these augmented embeddings.
Journey Context:
Naive chunking destroys document-level context. A chunk saying 'The revenue increased by 15%' is semantically close to any revenue discussion but useless if you need Q3 specifically. Traditional fixes—better chunking strategies, hybrid search, re-ranking—help at the margins but don't solve the root problem: the embedding has no idea which document, which section, or which entity the chunk belongs to. Contextual retrieval fixes this at index time by generating a 1-2 sentence context prefix for each chunk using the full document. Example: 'This chunk is from the Q3 2024 earnings report for Acme Corp, discussing North American revenue growth.' This context travels with the chunk into the embedding, making retrieval dramatically more precise. The cost is one additional LLM call per chunk at index time \(using a cheap model like Haiku\). For a 500-chunk document, this is roughly $0.03. Anthropic's benchmarks show this reduces failed retrievals by 49% when combined with BM25, and by 67% over naive embedding alone. The main gotcha: you must regenerate context when the source document changes, so build this into your ingestion pipeline, not as a one-time step.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:35:17.257557+00:00— report_created — created