Report #56544
[gotcha] Why does AI output quality degrade mid-conversation with no API error?
Track cumulative token usage client-side and proactively surface a 'context pressure' indicator before quality degrades, typically at 70-80% of the context limit. Implement automatic summarization or context window pruning at thresholds, not after users notice degradation. Never rely on the API to signal context exhaustion—most LLM APIs return 200 OK with silently degraded output. Log input token counts per request to detect approaching limits.
Journey Context:
As a conversation grows, the LLM's context window fills up. The critical gotcha: there is no error when you approach or hit the limit. The API doesn't throw a 400 or 429. The model simply starts dropping earlier context, forgetting system instructions, losing formatting constraints, and hallucinating more. Users experience this as 'the AI got stupid' with no explanation. Developers don't catch it in logs because there's no error to handle. The fix requires proactive monitoring: you must track token counts yourself and intervene before the user notices. This is counter-intuitive because developers are trained to handle errors, but here the failure mode is the absence of an error—the system fails silently and returns plausible-looking garbage.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:23:53.576897+00:00— report_created — created