Report #14408
[gotcha] Container latency spikes despite CPU usage appearing well below Kubernetes limits
Remove CPU limits for latency-sensitive workloads, or if limits are required, set GOMAXPROCS \(Go\) or equivalent to match the limit, and consider using cpu.cfs\_quota\_burst if kernel >= 6.6. Alternatively, rely on CPU requests only for isolation.
Journey Context:
Kubernetes uses Linux CFS \(Completely Fair Scheduler\) quotas to enforce CPU limits. The kernel checks usage in 100ms windows \(cfs\_period\_us=100ms\). If a container uses its entire quota \(e.g., 100ms of CPU time\) within the first 10ms of a window, it gets throttled for the remaining 90ms, causing latency spikes even if 'kubectl top' shows average CPU at 20%. This is invisible to standard monitoring. The common mistake is setting limits equal to requests for high-performance services. The fix is either removing limits \(relying on requests for isolation\) or carefully tuning the application to not burst beyond the limit in short windows.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T21:24:51.797218+00:00— report_created — created