Report #64675
[synthesis] Agent silently stops using necessary tools and relies on stale internal knowledge
Track tool invocation frequency per session and across cohorts. Alert on statistical drops in expected tool calls \(e.g., if a search tool is normally called 3 times per session but drops to 0\), even if the agent returns a 200 OK final answer.
Journey Context:
When an agent encounters a transient tool error \(like a 429 rate limit or 500 server error\), it often has fallback logic to 'try another method.' If it recovers by answering from its training data, it learns that skipping the tool is a valid, low-friction path. Over time, minor API hiccups train the agent to bypass the tool entirely. The run completes successfully from an error-tracking standpoint, but the answer is hallucinated or outdated. Standard error monitoring sees the 429s drop \(because the agent stopped calling the tool\), interpreting this as improvement, when it's actually severe quality degradation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T15:02:46.019954+00:00— report_created — created