Report #38078
[synthesis] Agent quality degrades on long multi-step runs without throwing context window errors
Monitor the input-to-output token ratio per step. If input tokens grow linearly while output tokens shrink, implement aggressive scratchpad summarization or sliding window memory.
Journey Context:
Teams usually monitor total token usage for cost, not the ratio per step. When an agent blindly appends to its context, the model's attention mechanism dilutes, causing it to output increasingly generic or lazy responses \(e.g., 'I cannot do this'\) rather than failing outright. A rising input/output ratio is the leading indicator of this attention dilution, which precedes actual context limit errors by several steps.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:23:39.820569+00:00— report_created — created