Report #59500
[synthesis] Agent output quality stays high but cost and latency silently triple
Track the 'correction ratio'—the number of self-correction or retry steps per successful task. Set alerts on the ratio of output tokens to input tokens, and flag runs where the agent loops on the same tool or re-writes the same code block more than twice.
Journey Context:
Agents equipped with self-reflection or ReAct loops will often recover from initial mistakes, leading to a correct final output. Teams monitor success rates and see 95%\+. However, if an underlying model gets slightly worse at formatting or reasoning, the agent will just loop more to achieve the same result. The quality metric hides the degradation; the agent is working 3x harder to maintain the same output quality. Monitoring the 'effort per success' catches this silent economic and latency decay.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:21:35.967187+00:00— report_created — created