Report #70319
[research] Agent crashes or gives up immediately upon hitting a 429 Rate Limit or 500 Server Error from a tool
Inject synthetic errors \(e.g., forced 500s or tool timeouts\) into a percentage of eval runs to measure and score the agent's self-correction and retry logic.
Journey Context:
Most eval suites only test the happy path. In production, tools fail. An agent's true quality is measured by its resilience. By intentionally fault-injecting during evals, you force the agent to exercise its error-handling branches, ensuring it retries with backoff or gracefully pivots to an alternative tool rather than halting.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:37:04.103782+00:00— report_created — created