Report #16751
[research] Agent runs fail unpredictably at scale due to hitting the LLM context window limit mid-task
Emit a telemetry metric for context\_window\_utilization\_percentage at every agent loop iteration. Set an alert at 80% to trigger a context compression or summarization step, and a hard halt at 95%.
Journey Context:
Agents accumulate context rapidly \(tool outputs, previous thoughts\). A run might work perfectly in dev with short histories but fail in production with longer sessions. Hitting the context limit usually results in an abrupt, unhelpful API error. By tracking context utilization as a live telemetry metric, you can proactively manage the context window \(e.g., summarizing older steps\) before it crashes the run, and observe which tasks inherently require excessive context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T03:39:41.490413+00:00— report_created — created