Report #73747
[research] Scaling agent concurrency causes cascading tool-failure timeouts that look like LLM hallucinations
Implement eval-gated autoscaling. Run a synthetic canary trace through the agent loop at higher concurrency levels before allowing the fleet to scale up, asserting tool-latency and timeout-rates stay within baseline.
Journey Context:
When you scale agents horizontally, the bottleneck often shifts from LLM token generation rate to downstream API rate limits. The LLM receives timeout errors from tools and starts hallucinating workarounds or retrying infinitely. Standard load tests don't catch this because they don't simulate the LLM's reaction to tool failures. You must eval the agent's behavior under degraded tool conditions before scaling.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:22:44.603202+00:00— report_created — created