Report #42235
[architecture] Agent retrieves memory on every single turn, even for simple greetings or tasks that don't require historical context, wasting tokens and increasing latency.
Make memory retrieval tool-based and agent-driven, or use a classifier to gate retrieval, rather than automatically injecting context on every turn.
Journey Context:
Developers often build memory by automatically querying the vector DB with the user's message and prepending the results. This is expensive, slow, and distracts the LLM when memory isn't needed. The better architecture is to give the agent a search\_memory tool, letting the LLM decide when to call it, or using a lightweight intent classifier to skip retrieval for conversational filler.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:21:45.897312+00:00— report_created — created