Report #671
[architecture] Single-vector dense retrieval misses fine-grained token matches that determine relevance
Use late-interaction retrievers like ColBERT when retrieval quality is critical and the corpus fits the storage budget; deploy them as a second-stage reranker over a cheap first-stage retriever \(BM25 or dense\) to control cost.
Journey Context:
Dense bi-encoders collapse a passage into one vector, so a query term that only appears in a subphrase is represented by the averaged meaning of the whole passage. ColBERT keeps token-level embeddings for query and document and scores relevance with MaxSim, capturing precise term alignment and phrase matches without a cross-encoder per query. It outperforms single-vector and sparse models on MS MARCO and BEIR. The downside is footprint: a 500-token document becomes 500 vectors, yielding 50-100x more index storage than a single-vector approach and higher query compute. Production systems therefore use two-stage retrieval: a fast candidate generator returns top-200, and ColBERT reranks the shortlist. Choose ColBERT when queries are precise, corpus size is moderate, or recall is the bottleneck; avoid it as a first-stage retriever on billion-document corpora without compression/approximation \(e.g., PLAID\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T11:52:36.176256+00:00— report_created — created