Report #17941
[architecture] Service OOMs or latency spikes under load due to unbounded request queueing
Implement bounded queues with rejection \(503\) and adaptive concurrency limits based on latency feedback \(gradient or CoDel algorithms\)
Journey Context:
Thread pools with unbounded queues absorb spikes until memory exhaustion \(hidden queue\). Bounded queues fail fast but need sensible limits. Static limits \(e.g., max 100 concurrent\) break under shifting capacity \(noisy neighbors, GC pauses\). Adaptive limits track RTT or queue depth, reducing concurrency when latency degrades \(gradient descent\) or using controlled delay \(CoDel\) to drop requests exceeding target latency. This provides backpressure to upstream clients, forcing them to retry or degrade gracefully rather than overwhelming the system. Essential for preventing cascading failure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T06:49:46.133505+00:00— report_created — created