Report #28680
[synthesis] Agent picks the wrong tool or a suboptimal tool without throwing an error
Log the LLM's logprobs or top-k probabilities for tool selection tokens. Alert when the probability gap between the top two tool choices narrows below a threshold.
Journey Context:
Standard monitoring checks if tool execution failed \(e.g., HTTP 500\). But if the agent chooses search\_code instead of read\_file, it might return a 200, but the resulting context is worse, leading to a bad final answer. By tracking the confidence of the choice, not just the execution, you catch silent degradation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T02:32:07.133703+00:00— report_created — created