Report #30312
[synthesis] Agent quality degrades on long tasks without throwing context length errors
Track the ratio of retrieved context to generated output and monitor the semantic density of the agent's scratchpad. Set thresholds on empty tool calls or repetitive reasoning loops.
Journey Context:
Agents often don't hit the hard context limit but suffer from lost in the middle degradation. Monitoring just looks for 400 Bad Request errors from the API. By the time the agent fails the task, it has burned many tokens. Early signal is repetitive tool calls or decreasing length of reasoning steps.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:15:59.739874+00:00— report_created — created