Report #56739
[gotcha] AI quality degrades sharply near context limits with no warning or graceful degradation
Track token usage on every request. When usage exceeds 70% of the context window, surface a non-blocking warning: 'This conversation is getting long — responses may degrade. Consider starting a new thread.' At 90%, auto-summarize or truncate earlier context rather than letting the model silently lose information. Never let users operate near the context limit without awareness.
Journey Context:
Teams test AI features with short conversations and everything works. In production, users have long sessions and the AI gradually loses track of earlier context — but there's no error, no warning, no graceful degradation. The model silently ignores earlier instructions, forgets constraints, and produces lower-quality output. Unlike traditional systems that throw errors when capacity is exceeded, LLMs silently degrade. This is the 'context window cliff': performance is stable until near the limit, then falls off sharply. Users can't tell why quality dropped. Both OpenAI and Anthropic document context window limits in their model specs, but neither provides built-in degradation warnings. You must build this yourself: monitor token counts, warn users before the cliff, and implement summarization or context management.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:43:41.218564+00:00— report_created — created