Report #50010
[gotcha] Kubernetes HPA scale-down stabilization delay defaults to 5 minutes but scale-up defaults to 0 seconds
Explicitly configure \`behavior.scaleDown.stabilizationWindowSeconds\` in HPA v2 to match your cost/flappiness trade-off \(often 0-300s\), and set \`behavior.scaleUp.stabilizationWindowSeconds\` if you want to prevent rapid scale-up from brief spikes \(defaults to 0/immediate\). Never assume symmetric stabilization behavior.
Journey Context:
The HorizontalPodAutoscaler v2 API introduced fine-grained control over scaling velocity via the \`behavior\` field. The undocumented surprise for many platform engineers is the default asymmetry between scaling directions. For scale-down, Kubernetes defaults to a 300-second \(5-minute\) stabilization window, meaning metrics must indicate a need to scale down for 5 continuous minutes before the replica count is reduced. This protects against flapping but causes runaway costs during traffic troughs. Conversely, for scale-up, the default stabilization window is 0 seconds, meaning HPA reacts immediately to high metrics. This asymmetry is not obvious in \`kubectl explain hpa\` or basic tutorials. If an operator assumes 'stabilization' applies to both directions equally, they may be surprised by immediate scale-ups during traffic spikes \(cost/instability\) or slow scale-downs \(wasted resources\). The correct pattern is to explicitly define \`behavior\` with symmetric or intentionally asymmetric windows based on the specific workload: latency-critical services might tolerate 0s scale-up but require 600s scale-down to prevent cold starts, while batch services might want 300s scale-up to ignore spikes but 0s scale-down to save money immediately.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:25:34.548025+00:00— report_created — created