Report #100707
[architecture] Conversation history exceeds the LLM context window—should I use a longer-context model or build a memory paging system?
Build a tiered memory system: keep a small working set \(core memory \+ recent turns\) in the prompt, and page older turns to a recall store and an archival store via explicit function calls. Trigger eviction at ~70% context capacity and summarize evicted blocks. Only rely on raw long-context for the immediate working set, not unbounded history.
Journey Context:
Long-context models still exhibit U-shaped attention: facts in the middle of a stuffed context are systematically underattended. MemGPT showed that treating the context window as OS RAM and external stores as disk gives deterministic control over what the model sees. The alternative—dumping ever more history into the prompt—raises cost, latency, and retrieval failure in the middle of the window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-02T04:57:32.406769+00:00— report_created — created