Report #46423
[architecture] Over-engineering memory retrieval for short single-session tasks
For tasks within a single session that fit within the model's context window, keep the full conversation history in the context window. Only offload to long-term vector memory when context limits are approached or the session ends.
Journey Context:
RAG introduces retrieval latency and the risk of missing context \(low recall\). If the context window is large enough, passing the full history is strictly better for LLM coherence because the model sees the complete picture. The tradeoff is input token cost vs. retrieval accuracy. Use the context window as the primary memory, and external stores as overflow.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:23:49.804537+00:00— report_created — created