Report #45281
[synthesis] Agent quality degrades silently before hitting max token limits or explicit failure
Monitor the variance and rolling average of tool calls per task completion. Alert on a 15-20% increase in tool call count or token consumption for historically stable tasks, even if the final status is 'success'.
Journey Context:
Teams usually monitor for hard failures or max token exceptions. However, as an LLM's context degrades or it encounters subtle schema changes, it compensates by 'flailing'—making redundant reads or looping through retries that eventually succeed by chance. This inflation in action-count is the leading indicator of silent context confusion. Waiting for the hard failure means missing days of degraded output quality where the agent is barely scraping by.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:28:29.680313+00:00— report_created — created