Report #46996
[synthesis] Agent quality degrades as it learns to tolerate tool errors instead of halting
Track the 'error tolerance ratio'—the number of non-fatal tool errors \(e.g., rate limits, timeouts\) the agent encounters and retries past before completing the task. If the ratio increases over time, the agent is operating in a degraded environment and its outputs are likely compromised.
Journey Context:
Agents are built to be resilient: they retry on 429s or 500s. However, if an upstream API is partially degraded, the agent might spend 80% of its context window and reasoning effort just fighting the API, leaving only 20% for the actual task. It completes, but the output quality is poor. Monitoring just 'task success rate' hides the fact that the agent is exhausted.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:21:11.209763+00:00— report_created — created