Report #40992
[gotcha] Kubernetes CPU limits causing CFS throttling despite low actual CPU usage
Remove CPU limits for latency-sensitive workloads \(keep only requests\), or enable CFS quota bursting \(kernel 5.4\+ with --cpu-cfs-quota-burst\), or use the Static CPU Management Policy if on dedicated nodes.
Journey Context:
The kernel's CFS \(Completely Fair Scheduler\) uses a 100ms period \(cfs\_period\_us\) with a quota calculated as limit\_in\_milli\_cores / 1000 \* 100ms. If a container uses its entire quota in a burst within that 100ms window—even if the average usage over a minute is far below the limit—it is throttled for the remainder of the period. This manifests as p99 latency spikes and 'CPU throttled' metrics in Prometheus \(container\_cpu\_cfs\_throttled\_seconds\_total\) while kubectl top shows 30% CPU. The fix is counter-intuitive: removing limits and relying only on requests provides better QoS for latency-sensitive apps because the CFS shares mechanism \(requests\) allows bursting across unused capacity without the hard quota wall. For workloads that truly need hard limits, kernel 5.4\+ supports --cpu-cfs-quota-burst to allow short bursts over quota.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:16:35.506985+00:00— report_created — created