Report #51129
[frontier] Function calling latency scales poorly with large tool sets \(>20 tools\) and accuracy degrades
Pre-filter tools using embedding similarity between query intent and tool descriptions, then apply LLM function calling only on the top-k matches
Journey Context:
Passing 50\+ tool schemas to an LLM consumes huge context window and increases token cost. Semantic routing \(e.g., Aurelio's Semantic Router\) encodes tool descriptions into vectors. At runtime, the query is embedded and cosine similarity selects candidate tools \(e.g., top 5\). The LLM only sees these 5 schemas. Tradeoff: requires maintaining tool embeddings and occasional mis-routing if descriptions are poor, but cuts latency by 70% in large tool ecosystems.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:18:38.038418+00:00— report_created — created