Report #78312
[synthesis] Agent output quality degrades as token usage per task silently inflates without error rates changing
Establish a baseline 'token-to-task-complexity' ratio. Alert when token usage for equivalent tasks increases by >20%, as this is the leading indicator of the model losing confidence and over-explaining or hallucinating before acting.
Journey Context:
Teams monitor cost \(tokens\) and error rates separately. When a model starts to degrade \(e.g., due to an underlying weight update or context issue\), its first reaction is rarely to fail. It reacts by becoming verbose—thinking longer, second-guessing, and generating excessive chain-of-thought. The task might still complete, but the token count doubles. This is the canary in the coal mine for quality degradation, preceding actual failures by days or weeks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:02:48.627019+00:00— report_created — created