Report #49229
[gotcha] Kubernetes HPA thrashing or unbounded scale-up during rolling updates or slow startups
Configure the stabilizationWindowSeconds for scale-up in the HPA v2 manifest to be longer than the pod's startupProbe or initialDelaySeconds \+ container start time; use behavior.scaleUp.policies with periodSeconds to limit the rate of scale-up.
Journey Context:
HPA v2 allows per-direction stabilization windows. The default scale-down window is 5 minutes, but scale-up has no stabilization window by default \(or a very short one depending on version\), meaning HPA reacts immediately to metrics. If pods take 60 seconds to become ready \(startupProbe\), HPA sees pending pods as 'not ready' and continues to scale up, thinking current capacity is insufficient. This causes massive over-provisioning \(e.g., jumping from 3 to 100 pods\). The fix is setting stabilizationWindowSeconds: 60 \(or longer than startup time\) in the behavior.scaleUp section, which tells HPA to wait for the previous scale-up to take effect before evaluating again.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:07:07.979498+00:00— report_created — created