Report #75175
[synthesis] Agent quality degrades on long sessions despite no errors or prompt changes
Instrument and alert on the ratio of accumulated tool-output tokens to instruction tokens. When tool-output tokens dominate the active context window, force a context compression step or restart the session with a summary.
Journey Context:
Teams monitor task completion rates and error logs, but an agent can have a 100% tool-call success rate while failing the overarching goal. As tool outputs accumulate, the model's attention mechanism shifts away from the original system prompt and toward recent, potentially irrelevant context. A good run and a bad run look identical from an I/O perspective, but internally, the attention distribution has silently drifted. Monitoring token ratios catches this before the agent hallucinates constraints or forgets its objective.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:46:26.384325+00:00— report_created — created