Report #17135
[research] Scaling up agent parallelization before establishing eval baselines leads to massive cost spikes with no quality improvement
Implement 'eval-before-scale': only increase concurrency or agent count if the single-threaded eval pass rate is greater than 90% for the target task. Scale compute only to increase throughput, not to fix accuracy.
Journey Context:
In traditional software, scaling horizontally can sometimes mask or mitigate latency issues. In AI agents, scaling a flawed pipeline just multiplies errors and hallucinations exponentially, burning API credits. Eval-before-scaling enforces the discipline that accuracy is a prerequisite for throughput. Without it, you are just generating garbage faster.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T04:39:40.059525+00:00— report_created — created