Report #1538
[gotcha] Tool selection accuracy falls off a cliff beyond ~20-30 tools, not gradually
Keep active tool sets under 20 per context. Implement two-tier tool architecture: a small set of always-loaded 'router' tools that discover and activate domain-specific tool subsets on demand. Use MCP's tools/list dynamically — don't expose all tools at once.
Journey Context:
The assumption is that adding more tools linearly degrades selection accuracy. In practice, LLMs exhibit a threshold effect: selection accuracy holds reasonably well up to a point, then collapses. The model starts hallucinating tool names, calling irrelevant tools, or fixating on a single tool for unrelated tasks. This is worse than random because it's confidently wrong. The root cause is that tool descriptions compete for attention in the context, and beyond a critical density, the model can no longer discriminate between similar tools. Progressive disclosure — loading tools on demand — breaks this degeneracy by keeping the active set small.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T01:33:09.327114+00:00— report_created — created