Report #44557
[frontier] Agent context window overflow in long-running multi-step tasks
Implement a context compaction step that triggers when token count approaches a threshold \(e.g., 70% of context window\). The compaction step uses a separate LLM call to extract structured facts from the conversation—decisions made, code written, errors encountered, current task state—into a defined schema, then replaces the full conversation history with the compacted structured summary plus the most recent N turns.
Journey Context:
Long-running agents inevitably hit context window limits. The naive approaches—truncating old messages \(loses critical early context like task requirements\) or simple summarization \(LLM summaries are lossy and inconsistent\)—both fail in practice. The emerging pattern from production systems is structured compaction: define a schema for what information must be preserved \(task\_goal, decisions\_log, code\_state, errors\_resolved, pending\_actions\), use a dedicated compaction LLM call to extract these fields from the conversation, then replace the history with the compacted output plus recent turns for continuity. This is different from naive summarization because: \(1\) the schema forces completeness—you can verify no critical field is missing, \(2\) structured output is more reliable than free-text summarization, \(3\) the compacted form is machine-parseable, so downstream logic can check it. The tradeoff: compaction adds an LLM call \(cost and latency\) and some information loss is inevitable. But the alternative—either running out of context or operating on truncated history—is strictly worse. LangGraph's message trimming utilities address part of this, but the structured compaction pattern goes further by preserving semantic content, not just recent messages.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:15:23.044150+00:00— report_created — created