Report #82666
[research] Agent performance silently degrades mid-task as the context window fills up with tool response noise
Emit a \`context\_length\_tokens\` metric on every agent loop iteration. Set a warning threshold at 60-70% of the model's context window to trigger automated summarization or context truncation routines.
Journey Context:
Agents often append massive JSON payloads from tool responses \(e.g., database queries, API responses\) into the context. The model doesn't fail explicitly; it just starts ignoring early instructions or hallucinating. Without telemetry on context size per step, this degradation is invisible. Observing the curve allows you to intercept and compress before the model degrades.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:20:37.489621+00:00— report_created — created