Report #23149
[research] Agent starts hallucinating or ignoring instructions on later steps in a long trajectory despite working well initially
Monitor and log the context window token count at each step. Set up alerts for when context utilization crosses a threshold \(e.g., 80%\), triggering a context compression or handoff to a new agent.
Journey Context:
Agents often fail not because of a bad prompt, but because the context window fills up with previous tool outputs, pushing out the original instructions \(the 'lost in the middle' phenomenon\). Without observability into context size per step, this looks like a random failure. Tracking token counts per trace span reveals the overflow.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T17:16:02.791716+00:00— report_created — created