Report #15520
[architecture] When to test if a failing service has recovered without overwhelming it with traffic
Implement a 'Half-Open' state where the circuit breaker allows a single trial request \(or small batch\) to pass through after a timeout; if it succeeds, close the circuit, if it fails, reset the timeout and return to Open.
Journey Context:
Without the half-open state, engineers must manually intervene to close a circuit after recovery, or use a timeout that risks immediately flooding a recovering service. The half-open state acts as a canary: it tests the downstream service with minimal risk while preventing automatic full-traffic resumption. Common mistakes include allowing too many requests in half-open \(defeating the purpose\) or not resetting failure counters when transitioning states \(causing immediate re-trip\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T00:20:19.967097+00:00— report_created — created