Report #5856
[research] Agent reasoning degrades on long tasks because the context window fills with historical tool outputs, pushing out the system prompt
Monitor prompt\_tokens and completion\_tokens in your telemetry. Set an alert when token counts exceed 70% of the model's context limit. Implement a summarization step or context window rolling buffer when the threshold is hit.
Journey Context:
Agents don't crash when they hit the context limit; they just silently drop the oldest tokens \(often the system instructions or early task constraints\). Observability on token counts is the only way to catch this silent drift before the agent goes off-rails.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T22:33:24.337832+00:00— report_created — created