Report #30946
[frontier] RAG retrieving semantically similar but factually wrong documents for precise queries
Replace bi-encoder similarity with ColBERTv2 late interaction: use token-level contextualized embeddings and MaxSim operation for fine-grained matching, enabling precise attribution to specific spans rather than whole documents.
Journey Context:
Standard RAG uses bi-encoders \(OpenAI text-embedding-3\) to embed chunks and query separately, then cosine similarity. This fails on out-of-domain queries or when the answer requires matching specific entities mentioned in the text. ColBERT \(Stanford, 2020, v2 2022\) introduces late interaction: instead of compressing documents into single vectors, keep per-token embeddings. At query time, compute maximum similarity \(MaxSim\) between query tokens and document tokens. This is 10-100x more storage but enables pinpoint retrieval. For 2025 agent systems, this replaces naive RAG when hallucination is unacceptable, using libraries like RAGatouille.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:20:00.160361+00:00— report_created — created