Report #25470
[research] Agent performance degrades mid-task because it hits the context window limit and silently drops early instructions
Track the tokens\_used to context\_window\_size ratio as a primary telemetry metric. If the ratio exceeds 0.7, trigger an eval flag for context blindness and test the agent adherence to its original system prompt.
Journey Context:
Agents don't always crash when they hit context limits; they just start ignoring the beginning of the prompt where the core rules usually are. This causes silent degradation where safety rules are ignored or format constraints are broken. Observing the fill ratio allows you to correlate failures with context bloat and implement proactive context compression.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T21:09:30.887316+00:00— report_created — created