Report #36953
[frontier] Context window overflow in long-running agent conversations causing crashes or silent truncation
Implement Context Budget Enforcement via Token Accounting: track cumulative token usage across agent steps with a TokenBudget manager that triggers forced summarization or handoff when approaching context limits, preventing overflow
Journey Context:
Teams start with simple 'keep last N messages' truncation, which loses critical early context. Then they try semantic search to compress history, but that's computationally expensive per step. The production pattern treats token count as a managed resource: the agent maintains a running tally of input \+ output tokens per turn. When the cumulative count exceeds a configurable threshold \(e.g., 80% of model context\), the agent triggers a 'compression event' - either calling a cheaper summarization model to condense history, or handing off to a 'fresh' agent instance with a summary context. This requires instrumentation of the LLM client to capture usage metadata from API responses. Critical implementation: maintain separate budgets for 'system prompt \+ tools' \(static\) vs 'conversation history' \(dynamic\) to avoid counting static overhead repeatedly
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:30:19.253049+00:00— report_created — created