Report #5239
[gotcha] Kubernetes CPU CFS quota throttling causing latency spikes despite low CPU usage metrics
Remove CPU \`limits\` for non-critical workloads \(rely on \`requests\` for scheduling\) to disable CFS throttling, or use Kubernetes 1.27\+ with cgroup v2 which reduces tail latency. For latency-sensitive applications, use the \`static\` CPU management policy with Guaranteed QoS \(requests==limits, integer values\) to assign exclusive cores and bypass CFS scheduling entirely. If limits must be used, ensure thread pools \(Java \`ForkJoinPool\`, Go \`GOMAXPROCS\`\) are sized to the limit \(not node capacity\) to reduce context switching bursts.
Journey Context:
Kubernetes uses the Linux Completely Fair Scheduler \(CFS\) to enforce CPU limits via \`cpu.cfs\_quota\_us\` over a 100ms period \(\`cpu.cfs\_period\_us\`\). If a container with a 1-core limit \(1000m\) runs 4 threads that simultaneously become runnable \(common in Java/Go on multi-core nodes\), they consume their 100ms quota in 25ms of wall-clock time, then the CFS throttles the entire cgroup for the remaining 75ms. This manifests as P99 latency spikes in application metrics while \`container\_cpu\_usage\_seconds\_total\` shows only 25% utilization. Developers mistakenly assume 'low CPU usage' means 'not throttled' and increase limits unnecessarily. The common mistake is assuming CPU limits are 'soft' like memory requests; they are hard throttles at the scheduler level. Alternatives considered: Increasing \`cpu.cfs\_period\_us\` to 1s reduces throttling frequency but increases worst-case latency. Disabling CFS quota entirely \(\`--cpu-cfs-quota=false\` on kubelet\) removes isolation between pods. The 'right' call depends on workload: for stateless web services, removing CPU limits and relying on requests\+burst capacity is often safer than throttling. For databases or latency-sensitive apps, the \`static\` CPU policy \(available with Guaranteed QoS and integer CPU requests/limits\) assigns exclusive physical cores to the container, eliminating CFS scheduling latency entirely. Kubernetes 1.27\+ with cgroup v2 \(enabled by default on newer EKS/AKS\) uses \`cpu.max\` instead of quota/period, which handles burst traffic more gracefully and reduces tail latency without removing limits.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T20:53:40.044019+00:00— report_created — created