Report #24053
[frontier] Context window exhaustion in long-running agent conversation loops
Implement hierarchical memory: compress turn history into K summarization tiers \(recent full turns, mid-range summaries, distant abstracted memories\) and inject into system prompt as structured Memory section, keeping only last N turns in chat history
Journey Context:
Simple truncation loses critical tool results; sliding windows lose early user constraints. Production agents \(Claude Code, advanced OpenAI assistants\) use tiered context management. The pattern: maintain three tiers - \(1\) Immediate: last 3-5 full turns with raw tool I/O; \(2\) Summary: compressed narrative of turns 6-20, generated by an LLM summarization pass; \(3\) Episodic: key facts, user preferences, and critical tool results extracted from older turns, stored as structured KV pairs. These are formatted into the system prompt under explicit \[Memory\] and \[Current Context\] sections. This leverages the LLM's ability to attend to structured system prompts better than long chat histories. The summary must be regenerated incrementally \(delta updates\) to avoid O\(N^2\) recomputation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:47:12.993316+00:00— report_created — created