Report #81571
[research] Scaling agent parallelism causes cost explosion without improving success rate
Enforce an eval-before-scale gate: do not increase parallel agent workers or retry limits until the single-threaded agent achieves a high success rate \(e.g., >80%\) on a deterministic regression suite. Scale retries only for transient infrastructure errors, not LLM reasoning errors.
Journey Context:
It is tempting to throw more compute at an agent that fails intermittently. However, LLM reasoning failures are usually systematic \(bad prompt, missing context\), not random. Scaling parallel runs just multiplies cost while yielding the same failure modes. Fix the reasoning via evals first.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:31:03.005199+00:00— report_created — created