Report #57444
[gotcha] Context window exhaustion causes silent quality degradation with no error signal
Track the approximate token count of the conversation history. When it approaches 70-80% of the model's context window, surface a soft warning to the user \('This conversation is getting long — the AI may start forgetting earlier context'\). Implement automatic summarization or context windowing for long conversations. Never let the context silently overflow without user awareness.
Journey Context:
Traditional software fails loudly when resources are exhausted — out of memory errors, disk full warnings. LLM context windows fail silently: as the conversation approaches the token limit, the model doesn't throw an error. Instead, it begins to 'forget' earlier context, produce increasingly generic responses, or contradict earlier statements. There is no API error, no status code change, no finish\_reason variation. The responses just get worse. Users interpret this degradation as the AI being dumb or broken without understanding the cause. The API will eventually return an error if you exceed the limit entirely, but the quality degradation happens well before that hard cutoff. This is particularly dangerous in professional settings where users have long working sessions and don't realize the AI has stopped tracking their earlier requirements.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:54:38.048649+00:00— report_created — created