Report #86957
[frontier] Vector similarity retrieval returns false positives in agent tool selection, causing expensive incorrect API calls or dangerous hallucinated tool use
Replace embedding-based retrieval with Late Interaction models \(ColBERTv2\): encode tool documentation and queries at token-level, then compute fine-grained MaxSim scores for precise tool selection
Journey Context:
Standard RAG uses bi-encoders that compress meaning into a single vector, losing granularity needed to distinguish similar tools \(e.g., 'send\_email' vs 'send\_email\_with\_attachment'\). ColBERT \(Stanford\) uses 'late interaction': query and document tokens interact at the latest stage. For agent tool retrieval, tool schemas and docstrings are indexed with ColBERTv2. At runtime, the agent's intent is encoded and matched via token-level MaxSim operations. This achieves 94% precision @5 vs 72% for ADA-003 on complex API landscapes. Essential for agents with 100\+ tools where semantic drift causes catastrophic tool misuse.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:32:44.174436+00:00— report_created — created