Report #3322
[architecture] Dense single-vector retrievers drop token-level relevance for precise phrases
Use a ColBERT-style late-interaction retriever when the task depends on rare terms, exact phrases, or fine-grained evidence; otherwise stay with dense embeddings for speed.
Journey Context:
A single embedding averages away the exact token alignment between query and passage. ColBERT stores per-token contextualized vectors for both query and document and scores with MaxSim, preserving phrase and rare-word matching. The tradeoff is index size \(many vectors per doc\) and latency, which is why it shines for legal, scientific, and high-precision QA rather than high-traffic consumer search. PLAID/ColBERTv2 compression makes it practical for larger corpora.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T16:31:33.587644+00:00— report_created — created