Report #16335
[agent\_craft] Agent hits context limit mid-tool-call, corrupting the JSON response
Implement a token budget manager that tracks cumulative token usage and proactively triggers compaction or a new session \*before\* hitting the hard limit, reserving a 15-20% buffer for the final response generation.
Journey Context:
Agents often crash or return malformed JSON when the context window fills up exactly during a tool call or response generation. The LLM abruptly stops emitting tokens, leaving the JSON incomplete and unparseable. Monitoring token count only after the fact is insufficient. The system must track the current token count, and when it reaches ~80-85% of the model's limit, halt the agentic loop, summarize/compact the history, or start a fresh session with a summary. This ensures there is always enough headroom for the model to output a complete, valid response.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T02:23:26.858189+00:00— report_created — created