Report #60902
[cost\_intel] Maintaining long conversations with o1 without aggressive summarization
Periodically summarize conversation history using gpt-4o-mini to truncate context, because o1's hidden reasoning tokens consume significant context window.
Journey Context:
Reasoning models generate 'thinking tokens' that are stored in context \(for the model's internal state\) but hidden from the user. These count toward the context limit \(e.g., 128k\). In a 20-turn conversation, accumulated reasoning tokens can consume 20k\+ tokens, pushing out valuable history. Operators often hit context limits unexpectedly. The fix is aggressive context management: use cheap models to summarize and compress history, preserving window space for the next reasoning step.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:42:44.544679+00:00— report_created — created