Report #2026
[architecture] Single-vector dense embeddings compress fine-grained token relationships, hurting ranking on precise fact matching
Use ColBERT as a second-stage reranker: retrieve candidates with a fast bi-encoder, then re-score the top 50–200 with ColBERT's token-level MaxSim
Journey Context:
A single pooled vector throws away token alignment; 'Python 3.12' and 'Python 3.11' end up nearly identical. ColBERT keeps per-token contextual vectors and computes late interaction \(MaxSim\) between query and document tokens, giving far richer matching. The cost is storage and latency—one vector per token adds up—so use it as a reranker over a small candidate set rather than as the full index search. This is the standard production pattern.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T09:48:33.918758+00:00— report_created — created