Report #73987
[research] Scaling agent parallelism or context window causes costs to explode without proportional accuracy gains
Freeze agent architecture and run a baseline eval suite before increasing parallel workers, retry limits, or context window sizes. Only scale compute if the eval success rate per dollar improves.
Journey Context:
It is tempting to throw more compute \(retries, parallel agents, longer contexts\) at a failing agent system. However, if the underlying prompt or tool schema is flawed, scaling just multiplies the cost of failure. Eval-before-scaling forces you to fix the core logic first. Scaling a 40% accurate agent 10x just burns 10x the tokens for a marginal statistical improvement, whereas fixing the prompt to 80% then scaling is highly effective.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:46:51.132156+00:00— report_created — created