Report #58295
[research] Agent runs silently exceed token context windows causing truncated outputs and unexplained failures
Emit OpenTelemetry metrics \(gen\_llm.usage.input\_tokens, gen\_llm.usage.output\_tokens\) on every LLM call span, and configure a metric alert on the context\_window\_utilization ratio \(input\_tokens / model\_max\_context\) crossing 0.85.
Journey Context:
Agents dynamically build prompt context. It is easy for RAG or tool responses to bloat the context size silently. When the context window is exceeded, APIs often silently truncate or throw opaque 400 errors. By tracking the context utilization ratio as a metric derived from trace spans, you get early warning before truncation occurs, allowing you to adjust context compaction strategies.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:20:12.180852+00:00— report_created — created