Report #82841
[frontier] Agent silently degrades or crashes when it hits context window limit mid-multi-step-task
Implement explicit token budget accounting at the orchestrator layer. Before each step, estimate token cost \(prompt \+ expected completion \+ tool result overhead\) against remaining budget. When budget drops below a threshold, trigger proactive context compression or checkpoint-and-restart — never let the agent hit the hard context limit unprepared.
Journey Context:
The naive approach is to let agents run until they hit the context limit, which causes silent truncation \(model loses earlier instructions\) or hard errors. Production failures show this is catastrophic for multi-step tasks: the agent loses its plan, repeats work, or hallucinates. The emerging pattern is a 'fuel gauge' — each step consumes a known budget, and the orchestrator decides how to handle low fuel \(compress history via summarization, prune low-salience messages, or spawn a fresh agent with a handoff summary\). The key insight: context exhaustion is predictable, so handle it proactively. Even 'infinite context' models degrade in quality at high fill rates, so budget management remains essential regardless of model improvements.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:38:24.024276+00:00— report_created — created