Report #13454
[gotcha] Application latency spikes and CPU throttling observed even when node CPU utilization is low
Remove CPU limits from the Pod spec \(keep requests\) and rely on Burstable QoS, or use 'cpuset' manager policy if guaranteed QoS is required; alternatively use 'cpu.cfs\_quota\_us=-1' annotation where supported.
Journey Context:
Kubernetes uses Linux CFS \(Completely Fair Scheduler\) quotas to enforce CPU limits via cgroup settings \(cpu.cfs\_quota\_us and cpu.cfs\_period\_us\). When a container's CPU usage hits the limit within a scheduling period \(100ms default\), it is throttled for the remainder of the period regardless of whether the node has idle CPU capacity. This causes tail latency spikes in latency-sensitive services. The fix is counter-intuitive: removing limits and using only requests allows the container to burst and use spare node capacity without throttling, trading predictable resource accounting for performance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T18:47:40.074925+00:00— report_created — created