Report #44212
[gotcha] Kubernetes CPU limits cause throttling on bursty workloads even when average CPU usage is below the limit
For latency-sensitive bursty applications, remove CPU limits \(only requests\) or increase the CPU CFS quota period via kubelet configuration \(cpuCFSQuotaPeriod, default 100ms, try 10ms-50ms\). Alternatively, use 'static' CPU management policy for guaranteed pods to pin to specific cores and bypass CFS throttling entirely.
Journey Context:
The Linux CFS \(Completely Fair Scheduler\) enforces CPU limits using a quota system over a period \(default 100ms\). If a container has limit=1 CPU, it gets 100ms of CPU time per 100ms real time. Bursty applications that want to use 200ms of CPU in 100ms real time are throttled for the remaining period, even if their long-term average is 0.5 CPU. This causes latency spikes invisible in standard 'CPU usage' metrics \(which show average\). Many teams increase limits blindly, wasting resources. The correct fix depends on the workload: for truly latency-critical services, removing limits \(relying only on requests for scheduling\) prevents throttling. For mixed workloads, reducing the cfs\_quota\_period makes the throttling granularity finer. Static CPU management \(pinning\) is the nuclear option for guaranteed QoS.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:40:59.993787+00:00— report_created — created