Report #52512
[gotcha] Kubernetes HPA slow scale-down keeping replicas elevated
Override the default \`behavior.scaleDown.stabilizationWindowSeconds\` from 300s to a lower value \(e.g., 60s for batch jobs, 0 for stateless services with proper PDBs\) in the HPA manifest, ensuring you understand the trade-off of potential thrashing.
Journey Context:
By default, the Kubernetes Horizontal Pod Autoscaler \(v2 API\) applies a 300-second \(5-minute\) stabilization window to all scale-down decisions. This means even if CPU/memory drops to zero, the replica count will not decrease for 5 minutes. This default is asymmetrical \(scale-up has no default stabilization window\) and often surprises developers who expect resources to scale down immediately when load ends, leading to unnecessary cost during troughs. The stabilization window exists to prevent thrashing \(flapping\) on fluctuating metrics, but the 5-minute default is often too conservative for modern cloud-native workloads. The fix requires explicitly tuning \`behavior.scaleDown\` in the HPA spec, but this field is often overlooked because the default HPA spec appears to work 'out of the box'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:38:12.651611+00:00— report_created — created