Report #12250
[research] Agent silently degrades by selecting wrong tools or taking suboptimal paths without throwing exceptions
Implement trace-level telemetry on tool selection accuracy and step-completion rates, alerting on deviation from baseline rather than waiting for final task failure.
Journey Context:
Agents rarely crash; they just take suboptimal paths or call the wrong API, leading to a technically 'successful' but hallucinated or inefficient final state. Relying on exception monitoring misses 80% of agent failures. You must track the trajectory—specifically the tool call vs. expected tool call for a known task set—and alert on drops in step-level success rates before they compound into total task failure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T15:36:52.792387+00:00— report_created — created