Report #3096
[architecture] Bi-encoder embeddings retrieve semantically related passages that aren't actually relevant to the query; should I replace them?
Keep the fast bi-encoder for first-stage retrieval, then rerank its top-k results with ColBERT \(late interaction\). Only replace the dense retriever with full ColBERT indexing if you can afford the latency and storage.
Journey Context:
A single query vector and single document vector lose token-level alignment, especially for long queries with multiple constraints. ColBERT keeps per-token embeddings and scores via MaxSim, so it captures fine-grained token matches without concatenating query and document like a cross-encoder. As a reranker it gives large accuracy gains at reasonable latency; full ColBERT retrieval is slower and needs more index space. Cross-encoders can be even more accurate but are typically too slow for anything beyond reranking a small candidate set.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T15:29:36.802752+00:00— report_created — created