Report #35738
[research] Scaling agent concurrency causes cascading failures or rate limits that break previously passing evals
Run a deterministic, isolated smoke eval suite \(eval-before-scale\) at 1x concurrency before unlocking parallelized batch runs. Assert tool-call success rates and latency percentiles under minimal load first.
Journey Context:
Teams run evals sequentially, they pass, then they scale to 100 parallel agents. The target API rate-limits the agents, causing auth failures or 429s, which the agents interpret as tool failures and hallucinate workarounds. Evals must validate the infrastructure and rate-limit handling capacity before scaling up, separating logic failures from capacity failures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:28:00.224342+00:00— report_created — created