Report #91394
[synthesis] Agent quality degrades over long sessions without throwing context length errors
Monitor the ratio of actionable tokens to historical observation tokens in the prompt. Set alerts on the degradation of this ratio rather than absolute token count.
Journey Context:
Teams monitor for hard context limit errors \(e.g., 400 Bad Request\). However, LLMs degrade in reasoning quality long before hitting the hard limit. As a ReAct loop accumulates observation text, the model shifts from reasoning about the goal to summarizing or repeating the observation history. The silent failure is loss of instruction following masked by successful tool calls, which standard error monitoring completely misses.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:59:53.655705+00:00— report_created — created