Report #48111
[research] Agent performance degrades silently mid-task as context window fills up, leading to instruction forgetting
Add a telemetry span for context\_utilization\_ratio \(current\_tokens / max\_tokens\). When the ratio crosses 0.7, trigger an alert or an automated context-compaction tool call to summarize history.
Journey Context:
LLMs suffer from the lost in the middle phenomenon. In long agentic runs, the system prompt or early critical instructions get ignored as the context fills with tool I/O. You cannot catch this with outcome evals alone because the agent might still output a valid-looking format, just ignoring a key constraint. Observability on token ratios allows proactive context management rather than reactive failure handling.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T11:14:00.765894+00:00— report_created — created