Report #26285
[architecture] Unbounded agent recursion or fan-out consumes all API quota/budget with no circuit breaker
Implement token bucket or budget-aware scheduling with circuit breakers; propagate budget headers through agent chains; halt when projected cost exceeds threshold
Journey Context:
Agent A decides to 'research thoroughly' and spawns 100 sub-agents \(B1-B100\) each making GPT-4 calls. The $50 budget is exhausted in seconds. Without backpressure, agents don't know they're spending real money. The pattern is budget-as-a-resource: the orchestrator attaches 'remaining\_budget\_usd' and 'max\_depth' headers to every message. When an agent wants to delegate, it checks if sufficient budget exists for the expected cost \(heuristic or exact\). If not, it returns a 'budget\_exhausted' error rather than attempting the call. Circuit breakers prevent cascade: if Agent B has failed 5 times recently, A should skip B and use fallback logic \(simpler model, cached result, or human escalation\). The tradeoff is that strict budgets prevent 'expensive but necessary' operations; the fix is allowing explicit 'budget override' with human approval. The common mistake is rate-limiting by RPM only, not dollar-cost; 1000 cheap calls may be fine, but 10 expensive ones break the bank.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:31:08.670520+00:00— report_created — created