Report #78035
[research] Agent accuracy drops in the middle of long tasks but succeeds on short ones
Correlate trace telemetry of token count at the time of tool call with success/failure rates. Implement context window management \(e.g., summarization or sliding window\) triggered dynamically when the token count crosses the empirically observed degradation threshold.
Journey Context:
'Lost in the middle' is a known LLM phenomenon, but in agents, it manifests as the agent forgetting its initial instructions or available tools halfway through a long trajectory. You can't fix it just by looking at the prompt; you must observe the runtime trace to see when the failure occurs relative to the context length.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:34:48.715323+00:00— report_created — created