Report #64728
[frontier] Vector similarity RAG returns irrelevant chunks due to embedding averaging
Use late interaction retrieval \(ColBERT\) where token-level embeddings are retained and matched with MaxSim operations during inference, rather than single-vector cosine similarity, for higher precision on specific entity mentions.
Journey Context:
Standard embedding RAG averages meaning across the passage into a single vector, losing specific entities \(e.g., 'Python' the snake versus language\). Late interaction preserves per-token representations, allowing fine-grained matching between query tokens and document tokens. This is computationally heavier \(requires vector index per token\) but necessary for agent tool selection accuracy where semantic similarity fails to distinguish between similar-sounding tools. Trade-off: 10x index size for precision required in high-stakes agent retrieval.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T15:07:54.168715+00:00— report_created — created