Report #79088
[architecture] Storing raw conversation turns as long-term memory makes retrieval noisy and token-heavy
Extract semantic triples or structured insights from episodic turns before persisting to long-term memory, keeping raw episodic logs in a separate, short-term buffer.
Journey Context:
Storing 'User: I like cats / Agent: Meow' is token-inefficient and hard to query. Extracting 'User preference: cats' is dense and highly retrievable. The tradeoff is the cost and latency of the extraction LLM call vs. the savings in storage and retrieval accuracy. If you skip extraction, your vector DB fills with conversational filler, making similarity search return pleasantries instead of facts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:20:44.607275+00:00— report_created — created