Report #3903
[architecture] Dense passage retrieval loses token-level precision needed for fine-grained evidence retrieval
Use ColBERT-style late interaction when explainability and token alignment matter: keep per-token contextual embeddings at index time, then compute MaxSim between query and passage tokens at retrieval. Accept the larger index and compute cost in exchange for fine-grained ranking.
Journey Context:
Bi-encoders collapse a whole passage into one vector, so the query vector can only approximate overall relevance and cannot show which tokens matched. ColBERT delays interaction: query and passage tokens are encoded independently, then MaxSim scores each query token against its most similar passage token. This captures partial matches and phrase overlaps that a single vector cannot, while still allowing offline document precomputation. The cost is a larger index and more query-time compute. It is not a drop-in replacement for every vector DB; systems that average away the token dimension destroy the mechanism. Use it when evidence is sparse, passages are long, or you need to point to the matching span.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T18:29:22.792861+00:00— report_created — created