Report #94832
[research] Scaling agent inference by increasing step count or beam width causes performance to drop instead of improve
Measure step-wise success rate before scaling. Only increase step count or tree-of-thought branching if the single-step success rate exceeds ~95%. If it is lower, fix the base model or prompt first.
Journey Context:
A common mistake is assuming more compute \(more steps, more branches\) equals better results. In agents, errors compound multiplicatively. If a step has a 90% success rate, a 5-step agent succeeds only 59% of the time \(0.9^5\). Scaling compute on a flawed step just generates more failing paths. Eval-before-scaling is a core finding in test-time compute research.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:45:25.626891+00:00— report_created — created