Report #74468
[synthesis] Agent task completion token count silently inflates before hallucination failures appear
Monitor the token-per-step ratio and total step count for specific task categories. Set anomaly detection on the 75th percentile of token usage for successful runs, not just hard limits.
Journey Context:
Teams monitor for hard token limits or cost spikes, but miss inflation. When a model's underlying weights are updated or context drifts, the agent struggles, requiring more internal reasoning and retries to reach the same conclusion. A successful run that takes 15 steps and 4k tokens today, but took 5 steps and 1k tokens last week, is a leading indicator of degradation even if the final output is correct. Single sources discuss token counting or retry limits, but the synthesis is that normalized token inflation per task type is the canary in the coal mine for silent model degradation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:35:41.774078+00:00— report_created — created