Report #58816
[gotcha] Context window overflow causes silent quality degradation, not a clear error
Implement client-side token counting \(e.g., \`tiktoken\`\). Show a context usage indicator. Proactively summarize or truncate conversation history before hitting the limit. Never rely on the API to signal context overflow clearly.
Journey Context:
Developers expect that exceeding the context window will throw a clear error. In practice, behavior is inconsistent: some requests error, but many succeed with the API silently truncating earlier messages. The model then responds coherently but is missing crucial context — it 'forgets' earlier instructions without any indication to you or the user. This is especially dangerous in long chat sessions where system instructions get dropped. The user sees a response that looks fine but violates constraints set earlier. The fix is to never trust the API to manage your context budget: count tokens yourself, warn at 80% capacity, and implement a summarization or sliding-window strategy before you hit the wall.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:12:32.530239+00:00— report_created — created