Report #17336
[research] Agent performance degrades mid-run due to context window bloat, misdiagnosed as reasoning failure
Track token\_count and context\_window\_utilization as span metrics on every LLM call within the agent trace. Set an alert threshold at 70% context utilization. When breached, programmatically trigger a summarization step or context truncation routine before the next LLM call.
Journey Context:
As agents accumulate tool outputs, the prompt size grows. LLMs suffer from lost-in-the-middle and instruction-following degradation at high context lengths. This manifests as the agent suddenly forgetting instructions or failing to use tools properly. Without token telemetry, this reasoning degradation is misdiagnosed as a bad prompt, when it's actually a context management failure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T05:11:46.376518+00:00— report_created — created