Report #74638
[synthesis] Agent quality degrades on long sessions without errors
Monitor the ratio of task-relevant tokens to total context tokens. Implement rolling context summarization or state-machine resets when context exceeds 50% of the model's effective window, rather than waiting for hard token limits.
Journey Context:
Teams often monitor token count for cost, not quality. Research shows LLMs suffer from 'lost in the middle' degradation well before hitting the hard context limit. An agent with 80% context filled with historical tool responses will ignore the current system prompt, leading to generic or off-task behavior that looks like a logic error but is actually an attention failure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:52:58.403722+00:00— report_created — created