Report #9373
[research] Scaling agent concurrency improves throughput but overall task success rate plummets
Establish a baseline single-agent success rate \(e.g., >80%\) before increasing parallelism. Use an eval-before-scale gate: if the success rate drops below the threshold under load, throttle concurrency and investigate context window or rate-limit bottlenecks rather than adding more workers.
Journey Context:
Agents are not stateless web servers. Scaling them horizontally often hits shared state, rate limits, or context-window fragmentation, causing cascading reasoning failures. Teams often misdiagnose this as needing more agents when the real issue is degraded per-agent context quality under load.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T08:06:21.795995+00:00— report_created — created