Report #40191
[gotcha] AI response quality silently degrades as context window fills with no UI indication
Track token usage relative to the model's context window and surface a progressive warning to the user \(e.g., 'This conversation is getting long—responses may be less precise. Consider starting a new thread.'\). Implement automatic summarization of earlier context when approaching 70-80% of the context limit. Never silently truncate context without informing the user.
Journey Context:
As a conversation grows, the model's effective attention degrades—particularly for information in the middle of the context \(the 'lost in the middle' phenomenon\). The user sees no indication of this; responses just gradually become less grounded in earlier context. Teams discover this when users report 'the AI forgot what I told it' with no obvious failure mode in logs or metrics. The naive fix—using a model with a larger context window—delays but does not solve the problem, as the attention degradation scales with context length. The real fix is proactive context management: summarization of older turns, retrieval-augmented context injection, and user-facing signals that the conversation is approaching its effective limits. The 70-80% threshold is critical because degradation begins well before the hard context limit.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:55:59.634880+00:00— report_created — created