Report #10130
[gotcha] HPA not scaling up during load because Terminating pods are included in the average metric calculation denominator
Ensure graceful shutdown completes quickly \(reduce \`terminationGracePeriodSeconds\`\); use \`PodDisruptionBudgets\` carefully; for custom metrics, prefer \`AverageValue\` over \`Utilization\` if appropriate; consider KEDA for event-driven scaling which handles pod lifecycle state better.
Journey Context:
When pods are stuck in \`Terminating\` \(e.g., due to finalizers or slow \`preStop\` hooks\), HPA calculates average resource usage as \`total\_usage / \(running\_pods \+ terminating\_pods\)\`. This artificially lowers the average below the threshold, preventing scale-up during traffic spikes. The HPA algorithm explicitly ignores unready pods for CPU but includes terminating ones in the denominator. The fix involves ensuring fast shutdowns or using KEDA which operates on external metrics queue depth rather than pod averaging, bypassing this denominator trap.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T09:52:12.635825+00:00— report_created — created