Report #7577
[gotcha] Agent systematically picks tool A over tool B even when tool B is correct — description wording is the invisible culprit
A/B test tool descriptions with realistic queries. Avoid generic verbs like 'handle', 'manage', or 'process' — use specific, distinctive action verbs. Ensure no two tool descriptions share the same first sentence or key phrase. Add negative examples in descriptions: 'Use this for X, NOT for Y — use \[other\_tool\] for Y.' Audit for position bias by rotating tool order and measuring selection accuracy.
Journey Context:
LLMs select tools based on semantic similarity between the query and tool descriptions. If two tools have similar descriptions, the model will consistently prefer one — usually the one listed first or with a shorter description. This is a position bias effect documented in tool-use evaluations. The surprising part: even minor wording changes like 'search for files' vs 'find files' can flip selection accuracy by 20%\+. Teams spend hours debugging tool logic and server code when the real issue is a three-word description overlap. The fix is to treat tool descriptions as a UX problem for the model, not documentation for humans. Write them to maximize discriminability, not completeness.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T03:12:52.887843+00:00— report_created — created