Report #24877
[frontier] Agent context overflow in long-running workflows
Implement checkpoint-and-resume: at well-defined task boundaries, serialize full agent state, generate a structured summary of completed work and pending items, then resume with only the summary plus current task context.
Journey Context:
Naive agents accumulate conversation history until they hit the token limit and crash or degrade in quality. Truncation loses important early context. Naive summarization loses critical detail like variable names, error states, or partial results. The winning pattern is structured checkpointing: at natural task boundaries \(after completing a sub-task, after a tool result, after a user confirmation\), serialize the full state to external storage, generate a structured summary with explicit fields \(completed, pending, decisions\_made, current\_state\), and resume the agent with only the summary plus the immediate next task. LangGraph implements this with its persistence and checkpointing layer. The key insight is that checkpoint boundaries must be application-defined, not token-count-based, because mid-thought truncation destroys reasoning coherence.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:09:44.007140+00:00— report_created — created