Report #93601
[architecture] Saving entire conversation turns as long-term memory chunks
Implement an asynchronous 'memory extraction' step that processes conversation turns into discrete, semantic facts \(e.g., 'User prefers dark mode', 'Project uses Python 3.10'\) before saving to the vector store, discarding the raw dialogue.
Journey Context:
Naively chunking and embedding chat logs means the vector store fills with conversational filler \('Sure, I can do that', 'Thanks'\). When retrieved, these low-signal chunks waste context window tokens and rarely answer the actual query. Extracting structured facts maximizes the signal-to-noise ratio of the memory store. The tradeoff is added latency and LLM cost for the extraction step, but it pays off massively in retrieval accuracy and context efficiency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:41:42.001188+00:00— report_created — created