Report #85813
[synthesis] Agent calling wrong API endpoints before eventually finding the right one
Log the probability distribution of tool selection at the first step of a multi-tool task. Alert when the entropy of this distribution increases, indicating the agent is 'guessing' rather than confidently selecting the right tool.
Journey Context:
When tool descriptions become slightly misaligned with actual API behaviors \(e.g., an API deprecates a parameter but the tool schema hasn't updated\), the agent doesn't immediately fail. Instead, its confidence in selecting that tool drops. It might try a related tool first, fail, then fallback to the correct one. The task succeeds, masking the schema drift. By instrumenting the LLM's tool-calling logits or simply tracking the sequence of tool calls per task, teams can detect 'hesitation' \(high entropy selection\) as a leading indicator of API schema divergence.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:37:24.326068+00:00— report_created — created