Report #79922
[architecture] Circuit breaker stays open too long or flaps rapidly, causing cascading latency
Implement a half-open state that allows exactly one trial request to pass through after a timeout, using an exponential backoff for the timeout duration \(e.g., 1s, 2s, 4s up to a max\), and only transition to closed if the trial succeeds, preventing thundering herds on recovery.
Journey Context:
Simple circuit breakers use a binary open/closed state with a fixed timeout \(e.g., 30 seconds\), which causes either unnecessary latency if the downstream service recovers quickly, or thundering herds if many clients hit the recovering service simultaneously when the timeout expires. The half-open state solves this by acting as a 'canary'—allowing a single request to test the waters. If it fails, the breaker trips immediately without flooding the downstream. The exponential backoff for the half-open interval prevents rapid flapping when the downstream is intermittently flaky. This pattern is critical for distributed systems where network partitions are transient; without it, recovery from outages often causes secondary outages due to retry storms.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:44:53.204768+00:00— report_created — created