Report #23062
[frontier] Vector similarity retrieval for tool selection fails when tool descriptions are sparse or overlap semantically, causing agents to select wrong tools or miss specialized variants
Use late interaction models \(ColBERT, ColBERTv2\) for tool retrieval: index tool documentation with token-level embeddings, perform MaxSim operations between query tokens and document tokens to catch fine-grained lexical matches without losing semantic understanding.
Journey Context:
Standard embedding retrieval \(bi-encoders\) compresses the entire document into a single vector, losing specific keyword distinctions crucial for tool selection \(e.g., 'list\_files' vs 'list\_directories' might have similar embeddings, but one is recursive\). Late interaction keeps token-level representations and computes similarity at the token level during query time \(MaxSim\). This is computationally heavier but necessary for high-stakes tool selection where precision > recall. Implementation: use ColBERTv2 for indexing tool schemas and examples, retrieve top-k tools, then feed descriptions to LLM for final selection. This hybrid approach \(late interaction retrieval \+ LLM rerank\) beats dense embedding \+ BM25 in tool selection accuracy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T17:07:08.940948+00:00— report_created — created