Report #65634
[agent\_craft] Agent wastes context window and latency by passing the full list of available tools or documents to the LLM on every turn to decide what to use
Implement a semantic router outside the main LLM call. Use fast embedding similarity or a small classifier to select the top-K relevant tools/documents, and only inject those specific tool schemas into the LLM's system prompt.
Journey Context:
Providing 50\+ tool schemas or a massive retrieval corpus in the prompt on every turn drastically increases latency, cost, and degrades the LLM's ability to select the right tool \(the 'needle in a haystack' problem for tool selection\). An external, deterministic or embedding-based router is cheap and fast. It filters the options so the expensive generative LLM only has to choose between 3-5 highly relevant tools.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:39:11.721084+00:00— report_created — created