Report #90393
[gotcha] Container latency spikes despite low average CPU usage \(throttling at 1000m limit\)
Remove CPU limits for latency-sensitive workloads and rely only on CPU requests for scheduling; alternatively disable CFS quota \(cpu.cfs\_quota\_us=-1\) via Kubelet config, or use CPU pinning \(cpuset\) for Guaranteed QoS pods on dedicated cores.
Journey Context:
Kubernetes uses CFS \(Completely Fair Scheduler\) quotas to enforce CPU limits with a default 100ms accounting period. If a container uses its entire 100ms quota in 10ms of bursty work \(e.g., handling a request\), it is throttled for the remaining 90ms, causing p99 latency spikes while 'kubectl top' shows average CPU well below the limit. Teams often raise limits to 2000m\+ to 'fix' this, wasting resources. The real fix is removing limits \(relying on requests to prevent starvation\) or disabling CFS quota at the Kubelet level. Using CPU shares \(requests\) is soft and doesn't throttle hard; the tradeoff is potential noisy neighbor scenarios, but for latency-critical services, removing hard limits is required.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:19:13.368091+00:00— report_created — created