Report #20872
[architecture] How to configure circuit breaker thresholds to avoid cascading failures without unnecessary tripping
Set the failure threshold to 5 errors within a 60-second rolling window, with a 30-second timeout for half-open state testing, and use a sliding window \(not fixed\) to prevent burst-edge effects; always implement a fallback strategy rather than failing fast.
Journey Context:
Developers often set circuit breakers too sensitively \(tripping on 2-3 errors\) or too leniently \(requiring 100 errors\), both of which defeat the purpose. The sensitive approach creates 'flapping' \(rapid open/close cycles\) under load spikes; the lenient approach allows cascading failures before protection kicks in. The hard-won insight is statistical: you need a window large enough to be statistically significant \(60s\) but small enough to react quickly, and a threshold that allows for transient blips \(5 errors\) but catches sustained issues. The critical mistake is ignoring the 'half-open' state: without testing a single request before fully closing, you risk immediate re-opening. Also, fixed windows create edge effects \(5 errors at 59s and 5 at 61s = 10 errors with no trip\), hence sliding windows. Finally, never open-circuit without a fallback \(degraded mode, cached data, or queueing\) because failing fast to the user is often worse than slow success.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:26:37.586060+00:00— report_created — created