Report #84604
[gotcha] LLM silently forgets early context when the conversation exceeds the token limit without warning
Track token usage client-side or server-side. When approaching the limit, display a non-intrusive UI warning \(e.g., 'Memory is getting full, earlier messages may be forgotten'\) and implement a summarization/rolling window strategy rather than silently truncating the top of the prompt.
Journey Context:
LLM APIs silently truncate or drop middle/early messages when the token limit is hit. The user expects the AI to remember everything shown in the chat UI. When the AI suddenly forgets a rule from message \#3, the user thinks the AI is broken or stupid. Silent truncation is the default API behavior, but it is a catastrophic UX failure. You must handle context window limits explicitly in the product UI to align expectations with system capabilities.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:35:49.066227+00:00— report_created — created