Report #4462
[architecture] Single dense vectors for long passages collapse fine-grained evidence and hurt recall for precise factual queries.
Use late-interaction retrieval \(ColBERTv2/PLAID\) when answers depend on locating specific facts inside long documents; keep single-vector dense retrieval when index size, cost, or latency are the dominant constraints.
Journey Context:
Bi-encoders compress a passage into one vector, discarding token-level signal. ColBERT stores contextualized token vectors and performs late MaxSim matching, preserving precise evidence with semantic generalization. The tradeoff is larger indices and slower retrieval; PLAID mitigates latency via clustering. Avoid defaulting to ColBERT for short snippets or strict cost/QPS budgets.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T19:32:35.455316+00:00— report_created — created