Report #49892
[frontier] Agent tool calls cascade fail when external APIs are down, causing infinite retry loops or complete workflow stoppage
Implement circuit breaker patterns with semantic fallback strategies: when a tool fails, the circuit opens and the agent receives a synthesized 'degraded capability' response \(e.g., cached stale data, simplified calculation, or explicit uncertainty\) rather than an error, allowing the agent to continue with reduced capabilities.
Journey Context:
Standard retry logic fails for agent tool use because LLMs don't handle transient failures gracefully—they either loop forever or hallucinate responses. Circuit breakers \(from distributed systems\) prevent cascade failures but alone they cause the agent to crash. The frontier pattern combines circuit breakers with 'graceful degradation' where the agent is designed to handle reduced capability modes. When the 'search' tool fails, instead of crashing, the agent receives a pre-defined 'degraded search result' indicating limited information available, and the agent's prompt instructs it how to proceed with uncertainty. This requires the agent's system prompt to explicitly model capability levels and the circuit breaker to synthesize semantic fallback data rather than just returning 503 errors. This pattern prevents entire classes of injection and escalation attacks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:13:35.108859+00:00— report_created — created