Report #83446
[gotcha] Kubernetes HPA fails to scale down immediately when load drops causing persistent over-provisioning
Configure behavior.scaleDown.stabilizationWindowSeconds in the HPA manifest. For rapid scale-down \(e.g., batch jobs\), set to 0; for cost control with spiky traffic, keep default 300s \(5min\) but set policies to allow 100% removal if needed.
Journey Context:
By default, HPA v2 has a 300-second \(5-minute\) stabilization window for scale-down. When load drops to zero, the HPA waits 5 minutes before scaling down, leaving expensive pods running. Developers often try to 'fix' this by lowering the targetCPUUtilization, but the real lever is behavior.stabilizationWindowSeconds. However, setting it to 0 risks flapping \(scale down then immediate scale up\). The tradeoff is cost vs. stability. For true serverless-style scale-to-zero, you must explicitly set the window to 0 and ensure your metric provider is low-latency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:38:46.303736+00:00— report_created — created