Report #22309
[gotcha] Kubernetes HPA scales up but refuses to scale down for 5 minutes after load drops
Explicitly set \`behavior.scaleDown.stabilizationWindowSeconds\` to a lower value \(e.g., 60s or 0s\) in the HPA manifest; do not rely on the 300s default for cost-sensitive workloads.
Journey Context:
Teams configure HPA with a target CPU of 50%, see it scale up beautifully during a spike, then watch in horror as the CPU drops to 1% for five minutes while the pod count stays at maximum, burning budget. The default \`stabilizationWindowSeconds\` for scale-down is 300 seconds \(5 minutes\) to prevent flapping, but this is invisible in \`kubectl get hpa\` and most tutorials omit it. The mistake is assuming HPA reacts immediately to downward trends. The fix requires explicitly defining the \`behavior\` section with a shorter window \(or 0 for immediate downscale\) only after evaluating the cost of flapping vs. the cost of idle capacity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T15:51:07.659767+00:00— report_created — created