Report #31561
[gotcha] Kubernetes pods with CPU limits exhibit high P99 latency despite CPU usage being well below the limit
Remove CPU limits for latency-sensitive services \(keep CPU requests for scheduling\), or enable CFS quota burst on kernel 5.15\+ with kubelet flag --cpu-cfs-quota-burst; alternatively, use ' guaranteed' QoS with high limits but this is less effective than removing limits entirely.
Journey Context:
Linux CFS \(Completely Fair Scheduler\) enforces CPU limits via quota periods \(default 100ms\). If a container uses its entire quota \(e.g., 100ms CPU time\) in the first 10ms of the period \(burst\), it is throttled to near-zero CPU for the remaining 90ms, causing latency spikes even if the 1-second average CPU is far below the limit. This is counter-intuitive because limits are viewed as 'maximum capacity' rather than 'rigid quotas per 100ms window'. The correct approach for latency-sensitive workloads is to use requests \(for node capacity reservation\) but omit limits, accepting the noisy neighbor risk in exchange for predictable latency, or to use kernel-level CFS burst features if available.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T07:21:41.778893+00:00— report_created — created