Report #99771
[architecture] ColBERT gives perfect token-level retrieval but is prohibitively expensive as a first-stage retriever
Use ColBERT as a reranker over top-k candidates from a dense or BM25 first-stage retriever; only use end-to-end ColBERT retrieval when the corpus is small-to-medium and you can afford PLAID or XTR compression.
Journey Context:
ColBERT's late interaction yields superior precision on keyword-heavy and entity-rich queries because every query token attends to every document token, but storing all token vectors makes the index 10-50x larger than dense passage embeddings and increases latency. Teams often deploy ColBERT as the first-stage retriever and regret the operational cost. The pragmatic pattern is: fast dense/BM25 for candidate retrieval, then ColBERT or ColBERTv2 as a multi-vector reranker on the top-k. If first-stage ColBERT is required, use PLAID \(clustered residual compression\) or XTR \(token pruning\) to reduce memory and query latency at a small accuracy cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:02:02.513156+00:00— report_created — created