Report #3317
[architecture] Context window fills up during long agent sessions and responses degrade
Treat the LLM context window as a fast cache, not a database. Keep only high-priority system prompts, recent turns, and retrieved snippets in-window; persist everything else to searchable external memory and fetch on demand.
Journey Context:
Teams often try to fit entire chat histories into the prompt, hitting token limits and causing the model to miss instructions. The right split is: context window = working memory \(short, curated\), vector/SQL store = long-term memory. This mirrors MemGPT's OS paging design: pages of memory are moved between contexts as needed. The tradeoff is latency \(retrieval cost\) versus coherence \(everything in context\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T16:30:34.391681+00:00— report_created — created