Report #71917
[frontier] Context window overflows causing expensive retries and lost information mid-conversation
Implement three-tier context: HOT \(current turn \+ system prompt, always kept\), WARM \(recent N turns compressed via summarization\), COLD \(retrieved facts with confidence scores\). Define transition triggers: token threshold -> compress HOT→WARM, age threshold -> summarize WARM→COLD.
Journey Context:
Naive approaches truncate from the top or use simple RAG. Production failures show LLMs need recent context \(hot\) for coherence, but also deep memory \(cold\) with provenance. The MemGPT-inspired pattern explicitly manages three tiers: HOT stays in the prompt directly; WARM is kept as compressed narrative summary \(not raw history\); COLD is retrieved via vector search but augmented with confidence metadata. The key innovation is the transition policies: when HOT approaches limit, oldest turns are summarized into WARM via a cheap model \(Haiku/4o-mini\); when WARM grows too large, it's distilled into COLD as knowledge graph triples. This prevents the 'lost in the middle' problem and maintains coherence over 100k\+ token sessions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:17:47.352315+00:00— report_created — created