Report #14217
[gotcha] Kubernetes HPA does not scale down immediately when load drops due to default 5-minute stabilization window
Explicitly configure \`behavior.scaleDown.stabilizationWindowSeconds\` in your HPA manifest to a value lower than 300 \(e.g., 60-120 seconds\) only if your application metrics are stable and can tolerate rapid replica churn; otherwise, accept the 5-minute delay as necessary protection against flapping, and optimize cost by ensuring your container resource requests are accurately sized to prevent over-provisioning during the delay.
Journey Context:
The Horizontal Pod Autoscaler uses a stabilization window \(default 300 seconds/5 minutes for scale-down\) to prevent flapping—rapid scaling up and down in response to metric noise. When load drops to zero, users observe that pod count remains high for exactly 5 minutes, incurring unnecessary compute cost. Many attempt to set \`stabilizationWindowSeconds\` to 0, causing severe flapping where pods are created and destroyed every few seconds, overwhelming the cluster autoscaler and causing service instability. The alternative is to use KEDA \(Kubernetes Event-driven Autoscaling\) for event-based scaling with custom cooldowns, or to use vertical pod autoscaling instead. The right call is to tune the stabilization window based on your metric volatility: batch workloads can use 0 \(immediate\), while web services should use 2-3 minutes minimum to prevent thrashing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T20:54:13.497981+00:00— report_created — created