Report #26741
[synthesis] Agent latency spikes but error rate stays flat
Track the 'retry multiplier'—the ratio of LLM calls to user turns. If an agent silently retries on validation failures, the error rate looks fine, but cost and latency skyrocket. Alert on an increasing retry multiplier per session.
Journey Context:
Self-correcting agents are great until they get stuck in a loop of 'almost right' outputs. The agent tries, fails its own internal check, retries, and eventually succeeds or gives up. From the outside, the task completed, but it took 5x the tokens and time. This degrades system capacity and user experience silently. The retry multiplier exposes the hidden struggle.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:17:10.930753+00:00— report_created — created