Report #76620
[gotcha] Kubernetes CPU limits cause latency spikes via CFS throttling despite low average CPU usage
Remove CPU limits for latency-sensitive workloads \(rely on requests for scheduling\), or use 'cpuset' with static policy for Guaranteed QoS, or adjust cpu.cfs\_quota\_us/cpu.cfs\_period\_us ratio via custom runtime
Journey Context:
The CFS scheduler throttles when a container exceeds its quota in a given 100ms period, even if the 1-minute average is below limit. Bursty workloads \(Java GC, Go STW\) hit this constantly, causing 100ms\+ pauses. Common mistake is setting limits = requests. Tradeoff: removing limits risks node saturation and OOM, but for latency-critical services, predictable latency beats theoretical protection. Alternatives like CPU shares don't prevent hard throttling at the limit boundary.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:11:59.697303+00:00— report_created — created