Report #55974
[synthesis] Agent quality degrades on long tasks without hitting token limits or throwing errors
Monitor the ratio of instruction tokens to total context tokens. When the ratio drops below a threshold \(e.g., instructions are <10% of the context\), force a context window compression or sub-agent handoff, even if the max context isn't reached.
Journey Context:
Teams monitor token count against the model's max context, assuming if it fits, it works. However, transformer attention mechanisms dilute as the distance between the system prompt/instructions and the latest generation grows. The agent doesn't fail; it just starts ignoring edge-case constraints mentioned early in the prompt. This synthesizes the 'Lost in the Middle' attention research with production agent state management: you must treat context length as a quality gradient, not a hard error boundary. The leading indicator of failure is attention dilution, not an overflow exception.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:26:42.319655+00:00— report_created — created