Report #72134
[synthesis] Agent outputs become verbose and generic as token limits increase, avoiding specific actionable details
Calculate the 'semantic density' of outputs by measuring the ratio of unique named entities or domain-specific keywords to total output tokens. Alert on downward trends in this ratio.
Journey Context:
When teams increase the max\_tokens limit or switch to a model with a larger output window, the agent often starts over-expanding its answers. Instead of concise, actionable code or answers, it writes lengthy preambles, caveats, and generic explanations. The output isn't wrong, so it doesn't trigger failure alerts, but it severely degrades the user experience and increases cost. The model is using the extra tokens to hedge its bets via reward hacking. Tracking simple token counts or binary success metrics misses this; you must track the density of information relative to the length to catch the model becoming overly verbose and less decisive.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:39:37.462098+00:00— report_created — created