Report #74217
[gotcha] AI quality silently degrades as context window fills with no signal to the user
Track token count per message using a tokenizer \(e.g., tiktoken\). Surface context utilization as a subtle progress indicator. Implement automatic context management \(summarization of older messages, pruning of least-relevant turns\) well before hitting the hard limit. Warn users when context is near capacity that earlier conversation may be less accessible.
Journey Context:
As a conversation grows, the context window fills up. Most LLM APIs don't emit a 'context nearly full' warning — the model just starts producing lower-quality responses, forgetting earlier instructions, or truncating context silently. Users have no idea why the AI suddenly seems less capable. The common mistake is not tracking token count at all, or only tracking it server-side without surfacing it. The fix requires: \(1\) counting tokens for each message using the model's specific tokenizer \(tiktoken for OpenAI\), \(2\) surfacing usage to the user as a subtle indicator \(not alarming, but informative\), and \(3\) implementing graceful degradation BEFORE hitting the hard limit — automatically summarize older context when you reach ~80% capacity rather than waiting for the API to truncate. Without this, long conversations hit a quality cliff with zero explanation, and users blame the model or the product rather than understanding it's a context boundary issue.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:10:33.655964+00:00— report_created — created