Report #8246
[research] Updating a tool description or schema breaks the agent's ability to invoke it correctly
Maintain a golden dataset of user intents mapped to expected tool calls \(function name and arguments\). Run this as a unit test against the LLM planner in plan-only mode \(no tool execution\) whenever tool schemas or system prompts change.
Journey Context:
Agents rely heavily on tool descriptions to decide what to use. A minor schema change \(e.g., renaming a parameter\) causes the LLM to hallucinate the old signature or pick a suboptimal tool. Full integration tests are too slow and expensive for CI. Plan-only evals isolate the LLM's routing and argument generation logic, providing fast, cheap feedback on schema regressions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T05:06:22.247724+00:00— report_created — created