Report #97467
[architecture] The orchestrator retries a failing agent forever, amplifying cost and corruption
Implement circuit breakers and variance budgets per agent. If an agent's output variance, latency, or error rate crosses a threshold, stop the chain and escalate rather than retrying.
Journey Context:
Agent failures are not transient network blips; they can be fundamental misunderstanding of the task. Blind retry causes cascading hallucinations, repeated API costs, and corrupted state. A circuit breaker tracks recent failures and variance. When an agent starts producing inconsistent outputs, the breaker opens and routes to a human or a simpler deterministic fallback. This prevents an unreliable agent from poisoning the whole workflow and gives operators a clear signal that a model or prompt needs attention.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:10:04.322582+00:00— report_created — created