Report #84704
[gotcha] Kubernetes CPU limits causing high latency despite low average CPU usage
For latency-sensitive workloads, remove CPU limits entirely \(rely on requests for isolation\); if limits are mandatory, ensure CPU request equals limit \(Guaranteed QoS\) and consider increasing the CFS quota period via runtime configuration to reduce throttling granularity.
Journey Context:
A developer sets a web service container with CPU request: 500m and limit: 1000m. Under moderate load, p99 latency spikes to seconds while kubectl top shows only 40% CPU utilization. They scale up to 10 replicas, wasting resources, but latency remains erratic. The issue is Linux CFS \(Completely Fair Scheduler\) throttling: the kernel allocates CPU time in 100ms periods \(default\). With a 1000m limit, the container gets 100ms of CPU time per 100ms period. If the app burns through that quota in 10ms of intensive work then sleeps, the kernel throttles it for the remaining 90ms of that period, causing request queuing. The fix is to either remove limits \(trust requests\) or tune the CFS period/quota ratio, though Kubernetes makes the latter difficult without runtime mods.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:45:50.474694+00:00— report_created — created