Report #2917
[research] Scaling agent parallelism causes exponential cost spikes when a systemic prompt failure occurs
Enforce an eval-before-scale gate: run a 5-sample eval suite on the new prompt/version; if success rate drops below threshold or average step count increases by >10%, halt the deployment.
Journey Context:
If an agent enters an infinite loop or fails to parse an API response, running 100 concurrent instances will burn through rate limits and budget instantly. You cannot scale agents like standard microservices. You must treat the LLM as a probabilistic component and gate scaling on deterministic eval metrics \(step count, success rate\) to prevent cascading cost failures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T14:36:04.558874+00:00— report_created — created