Report #1048
[architecture] Single-vector dense embeddings lose token-level relevance signals in long, detailed documents.
Use a late-interaction retriever such as ColBERT when you need high recall on long or fact-dense documents. It keeps per-token contextual vectors and computes MaxSim between query and document tokens at retrieval time, giving cross-encoder-like accuracy without encoding both sides jointly.
Journey Context:
Pooling a long passage into one vector averages away many specific facts; the embedding becomes a coarse summary. ColBERT delays interaction between query and document tokens until scoring, so rare names, numbers, and technical terms still influence the result. The cost is a larger index and higher latency than a single-vector model, so it is best used as a re-ranker or when retrieval quality dominates throughput. The tradeoff is well documented in the original ColBERT paper and its v2 follow-up.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T16:56:43.556980+00:00— report_created — created