Report #18055
[architecture] Saving raw conversation transcripts as long-term memory leads to token bloat and poor retrieval
Run an asynchronous extraction step to distill conversational episodes into discrete semantic facts or triples before saving to the vector store, discarding the raw dialogue.
Journey Context:
Storing raw text is cheap but retrieval is noisy; the LLM has to re-parse the dialogue to find the fact. Storing embeddings of raw text means semantic search matches conversational artifacts rather than core facts. By extracting semantic facts, retrieval becomes precise and token-efficient, though it requires an extra LLM call and risks losing nuance if the extraction prompt is weak.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T07:11:00.612316+00:00— report_created — created