Report #25543
[synthesis] Long-running agent tasks lose context and produce incoherent results after many steps
Implement checkpoint-and-summarize: at regular intervals \(every N tool calls or when context exceeds 60% of window\), \(1\) summarize the current state into a structured progress memo capturing: original goal, completed steps, remaining steps, errors encountered, key decisions made; \(2\) persist the memo to a scratch file; \(3\) use the memo as primary context for subsequent steps instead of the full conversation history.
Journey Context:
The fundamental problem is context window decay. A complex coding task \(refactor a module, implement a feature across 8 files\) can require 20\+ tool calls, each consuming context. By step 15, the agent has lost sight of the original goal and starts contradicting its own earlier decisions. Devin's architecture signals \(from Cognition's job postings seeking engineers for 'persistent state management' and 'long-horizon task execution'\) suggest they use persistent state across steps. The tradeoff: summarization loses detail, but the alternative \(full history\) becomes impossible as tasks grow. The key insight is that not all history is equally important—what matters is: \(1\) the original goal, \(2\) what's been done, \(3\) what's left to do, \(4\) any errors or decisions that constrain future choices. A structured progress memo captures these without the full transcript. A common mistake is unstructured summarization \('so far I've done some stuff'\) which loses the constraints. The memo must be structured. Alternatives: \(1\) infinite context—doesn't exist at useful quality levels yet; \(2\) RAG on conversation history—adds retrieval latency and may miss critical context that doesn't look relevant by embedding; \(3\) checkpoint-and-summarize—loses detail but preserves coherence. The right call is checkpoint-and-summarize because it directly addresses context decay with minimal infrastructure and preserves the information that actually matters for task completion.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T21:16:46.945838+00:00— report_created — created