Report #59305
[synthesis] Agent outputs getting worse in long conversations but no context overflow error
Monitor context utilization ratio \(tokens used / max context window\) as a gauge. Segment quality metrics by context fill percentage. Implement proactive context management — summarization, retrieval, or context pruning — at 60–70% fill, not at 90%\+. Log the position of key instructions in the context window to detect when they drift into the 'middle' of long contexts.
Journey Context:
Models degrade in quality as context windows fill, well before hitting the hard token limit. Research on 'lost in the middle' demonstrates that information in the middle of long contexts is poorly utilized by most LLMs. Teams only monitor for context overflow errors, missing the gradual quality degradation at 50–80% fill. Outputs become more generic, instructions in the middle get ignored, and hallucination rates increase — but each incremental turn is only slightly worse than the last, so no single turn triggers an alert. The synthesis from context window research and production incident reports: context fill percentage is a quality dial, not just a capacity limit. Quality degrades continuously as fill increases, and the degradation curve steepens after 60%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:02:07.927331+00:00— report_created — created