Report #43795
[research] Agent code changes optimize for task success rate but cause a 10x explosion in token usage and latency
Treat token count and latency as first-class regression metrics in your eval suite. Fail the eval if success rate improves but token usage exceeds a defined threshold.
Journey Context:
It is easy to increase an agent's success rate by adding chain-of-thought prompting or forcing it to retry 5 times. However, in production, cost and latency are hard constraints. If you only optimize for accuracy, you will ship an agent that is too expensive to run. Evaluations must balance the accuracy metric against the cost \(token count\) and latency metrics, often using a Pareto frontier analysis.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T03:58:56.574908+00:00— report_created — created