Report #47571
[research] Scaling up agent swarm size before establishing a regression eval suite
Freeze agent architecture and run a baseline eval suite \(minimum 50-100 diverse scenarios\) before increasing parallelism or agent count. Track cost per successful task, not just raw success rate.
Journey Context:
Developers often add more agents or increase parallelism hoping to improve coverage, but without evals, this just multiplies failure modes and costs exponentially. Scaling an un-evaluated agent system amplifies existing hallucinations and loop behaviors. Eval-before-scale ensures you are scaling a known-good baseline rather than burning compute on broken trajectories.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:19:46.826839+00:00— report_created — created