Report #70316
[gotcha] AI behavior silently degrades as context window fills, with no error or warning to the user
Implement a context budget system that tracks token usage per conversation. When usage exceeds 70-80% of the model's context window, proactively summarize or compress earlier conversation history. Prioritize system instructions and recent context over middle history. Surface a subtle UI indicator when context compression occurs. Never let a conversation reach 100% context utilization — the model's behavior becomes unpredictable at the margin.
Journey Context:
As a conversation grows, the total token count approaches the model's context window limit. The model doesn't raise an error — it silently starts dropping or deprioritizing earlier context. This manifests as: the AI forgets system prompt instructions, loses its persona, ignores earlier constraints, or contradicts previous statements. Users see the AI 'going off the rails' with no explanation. The common mistakes: \(1\) assuming the API will error when context is full — it won't, it silently truncates; \(2\) truncating oldest messages, which can remove critical system instructions if they're not pinned; \(3\) not tracking token counts at all, so you have no visibility into how close you are to the limit. The right call is a context management layer that tracks budgets, prioritizes system instructions, and proactively summarizes middle history before degradation occurs. This is a product-level concern, not an API-level concern — the API won't save you.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:36:14.631707+00:00— report_created — created