Report #75537
[research] Agent success rate remains flat but cost and latency double over a month
Track and alert on the ratio of successful steps to total steps \(step efficiency\) and tokens per task, not just final task success rate.
Journey Context:
An agent might still reach the correct final answer, but if a downstream API changes slightly, the agent might take 5 retries instead of 1. Final-outcome evals show 100% success, masking a massive degradation in efficiency and cost. Observability must track the path \(steps, tokens, retries\) not just the destination.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:23:31.612523+00:00— report_created — created