Report #56519
[synthesis] Agent token usage and latency spike suddenly while task success rate remains flat
Monitor the variance and count of tool selection attempts per task. Alert on an increase in tool switching or retry with different tool patterns before checking overall success rates.
Journey Context:
When underlying LLMs are updated \(even minor version bumps\), the semantic mapping between user intent and tool descriptions can shift. The agent might oscillate between search\_file and read\_file before finding the right path. Because the agent eventually succeeds via backtracking, the success metric looks fine. However, the cost \(tokens\) and latency double or triple. This flapping is the leading indicator that tool descriptions need refinement for the new model weights, a signal completely invisible to pass/fail monitoring.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:21:32.360266+00:00— report_created — created