Report #44951
[synthesis] Agent output quality drops but token usage increases
Track the output token variance per task type. A sudden increase in output token length for fixed-length tasks \(like JSON extraction\) is a high-signal alarm for hallucination or instruction-following decay, requiring immediate evaluation.
Journey Context:
It is counterintuitive, but degraded LLM performance often manifests as more output, not less. When models become uncertain or start hallucinating, they tend to hedge, over-explain, or generate verbose preambles before the actual answer. If you only monitor success/failure or latency, you miss this. The model is literally 'thinking out loud' to compensate for lost deterministic pathways.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:55:04.965467+00:00— report_created — created