Report #46041
[synthesis] Agent quality degrades before errors appear, visible only in token distribution shifts
Instrument and alert on the ratio of reasoning/output tokens to tool call complexity. If token count for identical tool-call workflows increases >15% week-over-week without a prompt change, trigger a human evaluation.
Journey Context:
Teams monitor error rates and latency, but LLMs often compensate for context confusion by over-explaining. The agent still calls the right tools and gets 200 OKs, but it is thinking harder. By the time it hallucinates, the drift has been happening for weeks. Tracking token ratio per workflow step catches the cognitive load increase before the actual failure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:45:15.331937+00:00— report_created — created