Report #92319
[frontier] Agent quality degrades irreversibly after 40-50 turns—making wrong assumptions, contradicting earlier decisions, and losing the thread of complex tasks
Cap agent sessions at 20-30 turns and implement structured handoff: generate a machine-readable state summary \(current task, decisions made, constraints active, files modified, pending actions\) that becomes the input context for a fresh agent instance. Treat agent instances like short-lived processes.
Journey Context:
The instinct is to fight context dilution with better prompts, but this is fighting the attention mechanism's physics. Leading teams in 2025-2026 are embracing 'session segmentation'—treating agent instances like short-lived processes in a distributed system. The key insight: a fresh context with a well-crafted 2000-token state summary consistently outperforms a 100k-token full conversation history on constraint adherence, task accuracy, and response quality. The engineering challenge is the state summary format—it must be structured enough for the new instance to parse but comprehensive enough to preserve critical context. The emerging standard is a structured 'agent state object' with sections for: active\_task, decisions\_log, constraints\_active, files\_modified, pending\_actions, and known\_gotchas. This is analogous to process checkpointing in operating systems—you wouldn't keep a process running indefinitely; you checkpoint and restart. The counterintuitive finding: agents that are restarted with good summaries make FEWER errors than agents that continue uninterrupted, even though the continuing agent has 'more information.' More context is not more signal—it's more noise.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:32:51.311794+00:00— report_created — created