Report #68924
[frontier] RAG retrieving semantically similar but factually wrong documents
Use ColBERT or ColPali for token-level late interaction instead of embedding cosine similarity. Index documents at the token level and perform MaxSim operations between query tokens and document tokens.
Journey Context:
Dense retrieval \(bi-encoders\) captures 'aboutness' not 'containment.' Late interaction matches query tokens to document tokens at inference time, allowing precise attribution and better handling of rare terms. Tradeoff: requires specific backends \(Vespa, Pinecone with late interaction, or local ColBERT indexes\) and higher compute than simple vector search. This is replacing naive RAG in production systems requiring high precision.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:10:23.417321+00:00— report_created — created