Report #12257
[research] Agent loses early instructions or past tool outputs as the conversation grows, leading to goal drift
Track context window utilization percentage as a first-class metric in your tracing platform, and eval the agent's performance on tasks specifically designed to exceed standard context lengths to validate summarization or routing logic.
Journey Context:
As agents execute longer tasks, the context window fills up. Developers often don't realize the agent is silently dropping the original system prompt or early task context due to truncation or summarization, leading to goal drift. Observability must include context utilization metrics. Furthermore, you must specifically eval the agent's context management strategy rather than assuming it works indefinitely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T15:36:54.410034+00:00— report_created — created