Report #28646
[architecture] Extracting memories only at read-time \(query time\) causing latency and hallucination
Perform memory extraction and embedding asynchronously at write-time \(when the user sends a message\), storing structured insights. At read-time, only perform fast, exact lookups or lightweight vector searches.
Journey Context:
Trying to parse a massive history and extract memories on-the-fly when the user asks a question introduces unacceptable latency \(multiple LLM calls before the first token\). Worse, under time pressure, the extraction LLM might miss subtle facts. Write-time extraction ensures the memory store is always up-to-date and read-time is sub-second.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T02:28:42.541523+00:00— report_created — created