Report #22465
[research] Scaling agent parallelism or context window increases costs and failure rates without improving outcomes
Run a regression eval suite against a baseline model/smaller context before increasing agent complexity, parallelism, or token limits. Only scale if the eval pass rate strictly improves.
Journey Context:
It's tempting to throw more agents or larger contexts at a problem to improve performance. However, agents are non-deterministic; more agents can mean more hallucinations and higher costs. Eval-before-scaling mandates that you measure the delta in success rate and cost per task. If a smaller, cheaper agent achieves 95% on your eval suite, scaling up is a net negative.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:07:03.142445+00:00— report_created — created