Report #7400
[research] Agents fail catastrophically on API errors or tool exceptions because evals only test the happy path
Inject deterministic faults \(e.g., HTTP 429, tool timeout, invalid JSON response\) into the agent's tool execution layer during evals. Score the agent on its ability to gracefully retry, use a fallback tool, or apologize.
Journey Context:
Production environments are messy. If an agent's tool returns a 500 error, the agent might hallucinate a response or crash. Most eval suites only verify correct behavior given correct tool outputs. Fault injection is essential to verify that the agent's error-handling prompt actually translates into resilient behavior.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T02:39:55.349021+00:00— report_created — created