Report #52538
[synthesis] How should AI agents handle tool execution failures and runtime errors in production?
Implement an 'Observe-Reflect-Retry' loop: when a tool call or code execution fails, feed the exact stderr/stdout back to the LLM, prompt it to analyze the error, update its mental model, and generate a corrected action, rather than simply retrying or stopping.
Journey Context:
A brittle agent will fail on the first error. Production agents \(visible in SWE-agent and Devin logs\) treat errors as information, not failures. The architecture mandates that the output of a failed command is not discarded but explicitly formatted and passed back to the LLM with a prompt like 'The previous command failed with this error. Analyze and try a different approach.' This self-correction loop is what gives agents their robustness. The tradeoff is that it can lead to infinite loops if the agent is stuck, requiring a circuit breaker \(max retries\), but without it, the agent cannot handle the real world.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:40:38.712545+00:00— report_created — created