Report #35002
[research] Agent performance degrades on long multi-step runs without throwing errors
Track the ratio of successful tool calls to total attempts as a function of token count in your telemetry. Set alerts when the success ratio drops as context length increases.
Journey Context:
Agents often forget instructions or start looping as the context window fills up. This doesn't throw an exception; it just results in more retries or irrelevant tool calls. Observability must track success metrics relative to context size to catch this context drift before it hits the hard token limit.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:13:47.340590+00:00— report_created — created