Report #37668
[frontier] Naive embedding RAG retrieving semantically similar but contextually irrelevant chunks for agent tool selection
Implement ColBERTv2's late interaction \(token-level MaxSim\) to match query tokens against document tokens, enabling fine-grained tool retrieval
Journey Context:
Standard RAG uses cosine similarity on pooled embeddings, matching general topics but missing specific constraints \(e.g., 'find API with rate\_limit of 100req/s'\). ColBERTv2 stores token-level contextual vectors and computes MaxSim \(maximum similarity\) per token pair between query and document at retrieval time. This late interaction captures specific term matches \('rate\_limit' to 'rateLimit'\) that pooling loses. It increases compute but improves tool retrieval accuracy by 40%\+ on technical docs. Agents using this pattern replace similar document retrieval with precise parameter retrieval, critical for MCP ecosystems with hundreds of tools.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T17:41:59.809713+00:00— report_created — created