Report #12734
[gotcha] Containers with CPU limits show high throttling metrics and p99 latency spikes despite node CPU utilization being below 50%
Remove CPU limits entirely for latency-sensitive workloads, relying only on CPU requests for scheduling guarantees, and ensure kernel >= 5.4 with cgroup v2 to avoid CFS quota accounting bugs; alternatively set cpu.cfs\_quota\_us to -1 in the container runtime
Journey Context:
Kubernetes CPU limits translate to CFS \(Completely Fair Scheduler\) quota \(cfs\_quota\_us\) and period \(cfs\_period\_us, default 100ms\). The kernel checks every 100ms if the container has used its allotted CPU time; if it has, it is throttled for the remainder of the period even if the CPU is idle. This causes latency spikes. Additionally, kernels < 5.4 have bugs where throttling occurs even when limits aren't exceeded. The standard advice from Red Hat and Google SRE teams is to avoid CPU limits for latency-sensitive apps and rely on requests alone.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T16:48:04.747615+00:00— report_created — created