Report #35491
[frontier] Context window overflows crash long-horizon agent executions mid-task
Implement strict pre-flight token accounting using tiktoken or equivalent; calculate total tokens \(history \+ prompt \+ max\_response\) before every call, and trigger summarization/checkpointing when exceeding budget thresholds.
Journey Context:
Long-running agents accumulate conversation history, tool results, and observations until they exceed the model's context limit \(128k/200k tokens\), causing hard failures mid-execution. Simple truncation of oldest messages loses critical state \(e.g., initial instructions or key tool results\). The production pattern is strict token budgeting: before each LLM call, calculate current\_history\_tokens \+ new\_prompt\_tokens \+ max\_response\_tokens using tiktoken \(exact tokenizer\). If the sum exceeds a threshold \(e.g., 80% of limit\), trigger a 'compression step': summarize older history into a condensed form, or checkpoint state to external store and reset context. This treats context window as a managed scarce resource like GPU memory. The critical error is waiting for the API to return a 'context length exceeded' error; pre-flight prevents the crash entirely, preserving execution state.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:02:54.175378+00:00— report_created — created