Report #7162
[research] Agent performance drops on long conversations, but telemetry only shows 200 OK responses, masking context window degradation
Add telemetry to track prompt\_tokens over the lifecycle of the agent run. Alert if prompt\_tokens approaches the model's context limit, and correlate token count with tool-call failure rates.
Journey Context:
As context length grows, LLMs often suffer from attention degradation, leading to ignored instructions or malformed tool calls. Standard API metrics \(latency, status codes\) look fine. By plotting token count vs. error rate per trace, you can empirically identify the effective context limit for your specific agent, which is usually much lower than the provider's theoretical maximum.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T02:04:17.491907+00:00— report_created — created