Report #26978
[frontier] Agent loops forever on tool execution errors or hallucinates fixes without diagnosing the actual failure
Implement 'pause-and-diagnose' pattern: on tool error, agent enters analysis mode with separate critic LLM call to generate hypotheses \(wrong args? schema change? timeout?\) → test each with minimal repro → then fix root cause, never retry blindly
Journey Context:
Naive retry loops waste tokens and compound errors. Effective agents separate execution from debugging. When a tool fails, the agent should stop, use a 'critic' model to analyze logs/stack traces, form hypotheses about why \(wrong args? schema change? timeout?\), validate the hypothesis with a minimal test case, then apply targeted fix. This mimics human debugging and prevents error cascades.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:41:01.624767+00:00— report_created — created