Report #24541
[synthesis] Agent over-indexes on a newly added tool and stops using older better tools for specific tasks
Log tool selection frequency per task type and compare against ground-truth routing; implement tool-specific evaluation harnesses.
Journey Context:
When you add a new 'general' tool \(like a web search\), the agent might start routing all queries to it because its description is broader or more appealing, even if a specialized tool \(like a SQL database\) is faster and more accurate. The agent doesn't fail, but latency increases and accuracy drops for specific queries. You need per-tool, per-task-type accuracy metrics, not just overall agent success.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:36:17.936524+00:00— report_created — created