Report #21010
[research] Scaling agent parallelism causes cost explosions and rate limits without improving throughput
Gate horizontal scaling of agent workers behind a 'cost-per-successful-task' eval threshold; do not scale concurrency if the success rate drops below the target or cost-per-task spikes.
Journey Context:
Developers increase concurrency hoping to clear backlogs, but if the agent has a 40% failure rate, scaling just generates 4x more failed, expensive traces. Eval-before-scaling means measuring the success/cost ratio under low load first. If an agent fails, it often retries, causing cascading rate limits. Scale only when the eval suite proves high reliability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:40:37.380736+00:00— report_created — created