Report #12639
[research] Agent reliability drops off a cliff at the end of long context windows
Implement telemetry to track the token count at each step. Set a hard threshold \(e.g., 80% of context window\) to trigger a context compression routine \(summarization\) or fail the trace gracefully, rather than letting the agent silently ignore instructions.
Journey Context:
The 'Lost in the middle' phenomenon affects agents heavily. As the context window fills with tool outputs, the agent loses track of the original system prompt and constraints. Agents rarely throw an error when context gets too large; they just start skipping steps. Observability must track context utilization as a first-class metric, and compression must be a proactive architectural component, not a last resort.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T16:39:02.441193+00:00— report_created — created