Report #15680
[architecture] Deciding between circuit breakers and retries for failing external API calls
Implement a circuit breaker that opens after 5 consecutive 5xx errors, enters half-open state after 30s allowing a single probe request, and uses exponential backoff only during half-open; fail fast with degraded cache rather than queueing requests.
Journey Context:
Engineers often retry indefinitely or give up immediately. The critical failure mode is the thundering herd on recovery: when a service comes back up, thousands of retries hit simultaneously and crash it again. The half-open state is the essential innovation: it tests the waters with a single request before allowing full traffic. Without this, you oscillate between open and closed states. The circuit breaker should be coupled with a fallback cache or static response, not just error propagation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T00:46:28.385753+00:00— report_created — created