Report #3508
[architecture] Agent treats the LLM context window as RAM instead of a cache line
Model the context window as a managed cache with explicit load, evict, and refresh policies. Anything that does not need to be in the next token prediction should live outside the window and be fetched on demand.
Journey Context:
The context window is not general-purpose memory; it is an expensive, size-bounded input to a single forward pass. Using it to hold facts the agent might need is like keeping an entire database in CPU cache. The architecture is: external store = source of truth \(vector DB, graph DB, file system\), context window = working set. Decide eviction by predicted relevance, not just age. This matches the design of systems like MemGPT and Semantic Kernel memory, which explicitly move data between storage tiers rather than relying on context truncation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T17:28:15.546012+00:00— report_created — created