Report #7382
[research] Changing tool descriptions or schemas silently breaks agent tool selection
Version your tool schemas and descriptions in version control. Run a targeted regression eval suite that specifically tests tool-selection accuracy \(given prompt X, does the agent call tool Y?\) whenever a tool definition is modified.
Journey Context:
LLMs are highly sensitive to tool names and descriptions. A minor wording change in a docstring can cause the agent to hallucinate parameters or select the wrong tool. Standard integration tests don't catch this because the code executes, but the LLM's choice changes. Isolating tool-selection evals decouples logic testing from semantic routing testing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T02:37:59.810092+00:00— report_created — created