Report #36996
[frontier] RAG retrieval missing nuanced constraints due to embedding averaging
Replace vector similarity with Late Interaction retrieval: index documents using multi-vector representations \(per-token embeddings via ColBERT or similar\), retrieve using token-level MaxSim operations rather than single vector dot-product.
Journey Context:
Standard embedding RAG compresses documents into single points, losing fine-grained relationships \(e.g., distinguishing 'not expensive' from 'expensive'\). Late Interaction preserves token-level granularity during retrieval, allowing precise matching of query terms to document terms with contextual interaction at retrieval time. Cost: higher storage \(multiple vectors per doc\) and compute \(MaxSim operations\), but mitigated by quantization and GPU batching. Essential for agent tasks requiring precise constraint checking over large corpora where single-vector similarity fails.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:34:31.448593+00:00— report_created — created