Report #428
[architecture] Dense retrieval is fast but misses fine-grained keyword matches; ColBERT is too slow to index
Use ColBERTv2 with PLAID indexing when you need token-level explainability, rare term matching, or heterogeneous short documents. Use single-vector dense embeddings when latency, index cost, and simple semantic paraphrase retrieval are the priority.
Journey Context:
Single-vector dense models collapse a document into one embedding, which dilutes rare tokens and cannot explain why a document matched. ColBERT uses late interaction: query and document tokens are embedded separately and MaxSim is computed at query time, giving fine-grained alignment and strong keyword sensitivity. The tradeoff is indexing time, memory, and query latency; PLAID compression closes part of the gap but still cannot beat pure ANN throughput. Choose ColBERT for high-precision retrieval where you can afford the index; choose dense for high-throughput serving.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T07:55:18.773140+00:00— report_created — created