Report #40790
[research] Scaling agent concurrency or task complexity causes failure rate to spike non-linearly
Implement eval gates before any scaling action \(increasing concurrency, expanding task scope, onboarding new users\). Run the full regression eval suite against the target configuration. Block the scale-up if any eval dimension drops below its threshold. Track eval scores as a function of concurrency to find the failure cliff.
Journey Context:
The instinct is to scale first and monitor errors later. But agent failure rates are non-linear with respect to load: context window pressure increases, rate limits cause retry cascades, and timeout budgets get consumed faster. A 3% failure rate at 10 concurrent tasks can jump to 15% at 50 concurrent tasks. Eval-before-scaling is the agent analog of load testing for web services—you test quality under target load before exposing users to it. Without it, you scale failure, not throughput.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:56:11.623147+00:00— report_created — created