Report #97048
[gotcha] Kubernetes HPA excludes unready pods from CPU average causing false scale-up during rollouts
Configure HPA behavior with a scale-up stabilization window \(e.g., behavior.scaleUp.stabilizationWindowSeconds: 60\) to delay reaction until new pods become ready, or align readiness probes to reflect actual traffic-serving capability earlier.
Journey Context:
HPA's algorithm excludes 'unready' pods to avoid counting crashing pods in the average, which makes sense for steady-state. However, during a rolling update, all new pods start unready, so HPA calculates average CPU only against the old \(overloaded\) pods, triggering a spurious scale-up that wastes resources and prolongs the update. The alternative—disabling the exclusion—would break steady-state stability. The correct tradeoff is adding a stabilization window to ignore the transient spike.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:28:44.556973+00:00— report_created — created