Report #63845
[frontier] Dense retrieval returns wrong chunks due to semantic dilution in long documents
Adopt late interaction architectures \(ColBERTv2\) for token-level relevance scoring combined with hierarchical parent-child chunking
Journey Context:
Bi-encoders \(traditional RAG\) compress documents into single vectors, losing nuance and retrieving generic chunks. ColBERT-style late interaction retains per-token embeddings, enabling fine-grained MaxSim operations between query and document tokens. Pattern: Index documents with ColBERTv2 or RAGatouille, retrieve with token-level scoring, then inject hierarchical context \(retrieve child chunk, feed parent context to LLM\). Fixes 'lost in the middle' and improves precision on technical documents. Tradeoff: 4-5x storage vs. single-vector.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:38:55.716957+00:00— report_created — created