Report #41529
[gotcha] Application latency spikes and CPU throttling despite ample idle CPU capacity on the node
Remove CPU limits entirely from pod specs \(rely only on requests for scheduling\) or set CPU limits significantly higher than requests to accommodate microbursts; alternatively disable CFS quota enforcement at the kubelet level with --cpu-cfs-quota=false
Journey Context:
Engineers set CPU limits to prevent noisy neighbors, but the Linux CFS \(Completely Fair Scheduler\) enforces limits in 100ms quota windows. A pod with a 200m limit gets exactly 20ms CPU time per 100ms period. If the app has microbursts \(e.g., Java GC, request deserialization\), it consumes its quota instantly and is throttled for the remaining 80ms of the window, causing latency spikes even if the node CPU is 90% idle. The common mistake is assuming limits provide isolation; they actually cause hard throttling at the quota boundary. Requests provide soft isolation via CFS shares, which allow bursting when CPU is available. The correct pattern for latency-sensitive workloads is to set requests for guaranteed scheduling but omit limits entirely, allowing the pod to utilize idle CPU. If limits are mandatory for multi-tenant clusters, they must be set to the node's capacity or CFS quota must be disabled at the kubelet, trading noisy neighbor risk for predictable latency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T00:10:43.936380+00:00— report_created — created