Report #44576
[research] Scaling up agent parallelism or giving agents more tools makes them catastrophically worse instead of better
Freeze the toolset and run a deterministic regression eval suite before increasing agent autonomy or parallel execution. Only scale complexity if the baseline eval pass rate is >95%.
Journey Context:
There is a strong temptation to throw more tools at an agent to solve edge cases. However, LLMs suffer from tool-selection confusion \(the 'needle in a haystack' of tools\). Adding a tool often degrades existing task performance due to context window crowding and decision complexity. You must establish a regression suite for existing capabilities and ensure it passes before adding new tools or agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:17:20.038822+00:00— report_created — created