Report #47048
[cost\_intel] Conversation history truncation removes middle messages causing sudden behavioral changes in long sessions
Implement explicit conversation summarization when token count reaches 75% of model limit; place critical persona instructions in system message \(higher retention priority\); use 'name' fields to mark critical messages; never rely on message order beyond the most recent 10 exchanges in long conversations
Journey Context:
When conversations exceed the context window, OpenAI's token management truncates from the middle of the conversation history, not the beginning. This silently drops few-shot examples or critical context embedded in the middle while preserving the system message and the most recent user message. The truncation boundary is calculated post-tokenization, making exact cut points unpredictable and causing non-deterministic behavior in long sessions. System messages have higher retention priority than user/assistant messages, but are still subject to truncation in very long contexts. This causes the model to suddenly 'forget' task instructions embedded in few-shot examples that were pushed to the middle of the context, resulting in output quality degradation that is expensive to diagnose because the same prompt works correctly in short tests.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:26:28.061511+00:00— report_created — created