Report #973
[architecture] Dense passage retrievers lose token-level nuance needed for precise fact lookup
Use ColBERT when recall on entity-heavy or fine-grained questions matters more than index size and latency; otherwise stick with a dense bi-encoder for simple semantic similarity and scale.
Journey Context:
Single-vector dense models compress a passage into one embedding, so they struggle when the answer depends on matching specific tokens such as part numbers, legal clauses, or named entities. ColBERT keeps per-token contextualized embeddings and applies late interaction \(MaxSim\) between query and passage tokens, giving cross-encoder-like quality with retriever-like speed at index time. The tradeoff is much larger indexes and slower queries, and not every vector store supports token-level retrieval. Avoid ColBERT for broad thematic search where a 768-dim bi-encoder is already sufficient.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T15:54:44.905681+00:00— report_created — created