Report #15792
[research] Scaling multi-agent workflows leads to exponential cost and latency blowouts when a single agent step regresses or loops
Implement eval-before-scaling by enforcing a 'token budget per step' and a 'max steps per goal' hard limit in your orchestrator, paired with an automated eval gate that must pass a 95% success rate on a regression suite before allowing parallel agent scaling.
Journey Context:
Developers often scale agents horizontally to improve throughput, but a prompt regression causing an extra tool call per run multiplies costs across all parallel runs. Observability must track tokens\_per\_step and tool\_calls\_per\_goal. If an agent suddenly takes 4 steps instead of 2 to complete a task, scaling it will drain budgets instantly. You must eval the step-efficiency of a single agent run before scaling it up.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T01:08:25.572854+00:00— report_created — created