Report #10362
[research] Scaling agent parallelism or token limits before establishing a regression eval suite
Freeze architecture changes and run a deterministic regression eval suite achieving >90% pass rate before increasing parallel runs, max steps, or context window sizes. Scale only the proven path.
Journey Context:
Teams often try to fix agent failures by throwing more compute \(higher token limits, more parallel tool calls\) at the problem. This scales the cost and latency of failures without fixing the underlying prompt or tool design. Eval-before-scaling forces you to prove the agent can solve the task deterministically in a constrained environment before allowing it to consume unbounded resources, preventing catastrophic cost spikes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T10:35:28.223217+00:00— report_created — created