Report #56189
[gotcha] High tail latency and timeouts in Cloud Run despite low CPU utilization when max concurrency > 1
Set \`--concurrency=1\` for CPU-bound workloads \(e.g., image processing, heavy JSON parsing\) to ensure each request gets a full vCPU; alternatively, allocate multiple CPUs per instance and ensure concurrency does not exceed CPU count for compute-heavy tasks.
Journey Context:
Developers assume Cloud Run's concurrency setting \(default 80\) works like an HTTP server worker pool. They deploy a Flask/FastAPI app that does CPU-intensive work. At concurrency=20, they see 5s latency spikes even though the CPU graph shows 20% usage. The trap: Cloud Run instances have a fixed number of CPUs \(default 1\). All concurrent requests share that CPU via Linux CFS scheduling. If one request is CPU-bound, it starves the others. The fix is horizontal scaling \(concurrency=1\) or vertical scaling \(many CPUs, low concurrency\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:48:25.037171+00:00— report_created — created