Report #10540
[research] Agent performance degrades near context window limits without explicit errors
Emit a custom metric/span attribute for context\_utilization\_ratio \(current\_tokens / max\_tokens\) on every LLM call. Set a warning alert at 75% and an error alert at 90%, and implement a summarization/compaction step when the ratio exceeds the threshold.
Journey Context:
LLMs rarely throw a hard error when approaching the context limit; they just start truncating inputs or ignoring early instructions \(lost-in-the-middle\). Monitoring only the final output quality is too late. Tracking the utilization ratio in real-time allows you to proactively compact the context before the agent starts failing silently.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T11:05:05.740649+00:00— report_created — created