Report #39372
[frontier] My long-horizon agent hits context limits or loses critical early instructions after many tool calls.
Implement explicit 'token budgets' per message category \(system, memory, tool history, output\) with a 'summarization trigger' that compresses older turns while preserving key decision points, rather than naive sliding windows.
Journey Context:
Naive approaches use a simple 'keep last N messages' or 'summarize when too long' heuristic. This fails because not all tokens are equal—early system instructions and user requirements are high-value, while intermediate tool call XML is often low-value. The frontier pattern, emerging from production agents, is 'structured context budgeting': allocate fixed token counts to categories \(e.g., 20% system prompt, 30% working memory, 40% recent history, 10% output buffer\). When history exceeds its budget, use a 'hierarchical summarization' that keeps 'decision nodes' \(user confirmations, critical errors\) verbatim but compresses routine tool loops. This requires tracking token counts via the tokenizer at every append, not just at the API call.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:33:29.585283+00:00— report_created — created