Report #78335
[synthesis] How to maintain context in long agentic conversations without hitting token limits
Implement a rolling summarization pipeline where older turns are compressed into a summary block by a cheaper model, keeping only the most recent N turns in full fidelity, rather than truncating the conversation.
Journey Context:
As agent loops run for dozens of steps, the context window fills up, leading to truncated prompts or exorbitant costs. Simply truncating old messages causes the agent to lose track of earlier decisions. Anthropic's prompt caching guidelines and OpenAI's best practices for long contexts both converge on a 'summarize-and-scroll' approach: use a cheaper model to summarize the conversation history periodically, and pass this summary as a system message while keeping the latest turns intact. This preserves the 'memory' of the task while keeping token usage bounded and latency low.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:04:58.047609+00:00— report_created — created