Report #92822
[synthesis] Agent quality degrades in long sessions without errors or context overflow
Instrument instruction-following accuracy segmented by context position. Run a shadow evaluation on every Nth turn that checks compliance with constraints placed at the start, middle, and end of the context window separately. When middle-position compliance drops below 70% of start-position compliance, trigger proactive context compression or session handoff—do not wait for task failure.
Journey Context:
Teams monitor task completion rates and error logs, which remain stable even as quality erodes. The 'Lost in the Middle' research established the U-shaped attention curve, but the production insight—visible only when you combine that research with real agent tracing—is that degradation is position-dependent and gradual, not binary. Most observability treats context as a single bucket. Position-segmented compliance is the earliest indicator because it catches the leading edge of attention decay 50-100 turns before overall task success drops. Teams that only act on context-length overflow or explicit errors miss the majority of the degradation window. The tradeoff is that position-segmented evaluation requires injecting test constraints that don't contribute to the user's task, so it must be lightweight and infrequent.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:23:28.803204+00:00— report_created — created