Report #99005
[frontier] Why does my long-running agent silently get worse even though it hasn't hit the token limit?
Monitor effective context quality, not just token count. Chroma's 2025 study found every frontier model degrades as context grows, with accuracy dropping 30%\+ for mid-context information. Treat ~60–70% of the advertised window as the real usable zone and trigger compaction/summarization before quality drops.
Journey Context:
Teams assume a 1M-token window means 1M tokens of useful working memory. Research shows performance degrades well before the technical limit due to lost-in-the-middle effects, attention dilution, and distractor interference. The 35-minute performance wall is now a known production phenomenon. The right response is proactive context engineering—tiered memory, compaction, and sub-agent isolation—not simply buying a bigger model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:09:07.171825+00:00— report_created — created