Report #4851
[research] Agent gradually degrades in performance mid-run as the context window fills up, leading to truncated tool calls or forgotten instructions
Track context window utilization percentage as a metric on your trace spans, and implement automated context compression or summarization steps when it crosses a 75% threshold.
Journey Context:
Agents don't always crash cleanly when they hit the context limit; they often just start ignoring early system prompts or generating malformed JSON for tool calls. This looks like a 'bad reasoning' issue rather than a memory issue. By observing the token count per LLM call span, you can detect context overflow as the root cause and trigger automatic summarization before the agent fails.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T20:10:44.902618+00:00— report_created — created