Report #69451
[frontier] Agent context windows overflow unpredictably because tool results and conversation history grow unbounded
Implement explicit token budgeting: allocate fixed token budgets to each context segment \(system prompt, conversation history, retrieved context, tool results\) and dynamically truncate or summarize segments that exceed their budget before each LLM call.
Journey Context:
Production agents fail when their context window fills up — usually because tool results \(API responses, file contents, search results\) are unbounded and consume the entire window. The naive fix is to increase the model's context window, but this degrades performance \(models use information less effectively in longer contexts, per the lost-in-the-middle literature\) and increases cost. The emerging pattern is token budgeting: treat the context window as a fixed resource and allocate budgets. Example: 4k tokens for system prompt, 8k for conversation history, 16k for tool results, 8k for retrieved context, 8k reserved for generation. When a tool result exceeds its budget, compress it — truncation for structured data \(keep schema and first N entries\), summarization for unstructured text \(use a fast/cheap model to compress\). The key insight: enforce budgets BEFORE the LLM call, not in reaction to overflow errors. Implement as a middleware layer that inspects and compresses the context assembly before each API call. Tradeoff: budget enforcement adds a preprocessing step and summarization calls add cost/latency. But unpredictable failures and degraded performance from context overflow are worse.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:03:38.176155+00:00— report_created — created