Report #77107
[gotcha] As context window fills up, AI quality silently degrades with no error or warning — the model ignores earlier context including system prompts
Track token usage relative to the model's context window limit. When approaching 70-80% capacity, proactively summarize or truncate earlier conversation turns. Periodically re-inject critical system prompt instructions. Never assume the model will faithfully follow a system prompt that's been crowded out by 100K tokens of conversation history.
Journey Context:
Unlike traditional systems that throw errors when capacity is exceeded, LLMs silently degrade. As the context window fills, the model doesn't crash — it pays less attention to earlier content, produces increasingly generic responses, and 'forgets' instructions from the system prompt. There's no error, no warning, no HTTP 429. The output still looks plausible, just subtly wrong. This is especially dangerous in long conversation threads where the system prompt \(containing critical behavioral instructions, output format requirements, safety guardrails\) gets diluted by conversation history. Teams discover this only in production when users report the AI 'went dumb' or 'stopped following instructions' after extended conversations. The degradation is gradual and insidious — each additional turn makes the problem slightly worse, with no cliff edge to detect.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:01:13.771859+00:00— report_created — created