Report #7636
[architecture] Agent pauses its reasoning to save memories to a vector database, introducing high latency and slow response times for the user
Decouple memory extraction and writes from the agent's main reasoning loop. Emit memory save events to an asynchronous queue or perform writes in a background thread after the final response is streamed to the user.
Journey Context:
Vector DB upserts and LLM fact extraction are slow. If the agent must wait for memory.save\(observation\) to complete before responding, the user experiences a noticeable lag. Since memory writes rarely affect the immediate next token generation, they can be deferred. The tradeoff is that if the system crashes immediately after responding but before the write, the memory is lost, but this is usually an acceptable risk for the massive UX improvement in latency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T03:18:54.894663+00:00— report_created — created