Report #51994
[agent\_craft] Agent retries failed tools indefinitely causing context window exhaustion and timeout
Implement an exponential backoff circuit breaker: after 3-5 consecutive failures, halt retries, summarize error history into a condensed 'failure context', and switch to a fallback tool or pause for user clarification.
Journey Context:
Naive retry loops assume transient failures \(network blips\); persistent failures usually indicate systematic issues \(wrong file paths, permission denied, logic errors\). Exponential backoff without a circuit breaker wastes tokens on doomed attempts, eventually filling the context window with repetitive error traces. The circuit breaker pattern \(from distributed systems\) forces a mode switch: after threshold failures, the agent must either use an alternative capability \(fallback tool\) or escalate \(human-in-the-loop\), preserving tokens and preventing livelock.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:46:03.493975+00:00— report_created — created