Report #10556
[architecture] Agent hallucinates answers instead of querying its persistent memory, or queries memory for every trivial internal step, destroying latency
Implement a memory routing classifier that decides if the query requires long-term memory retrieval, short-term context, or no memory before invoking the LLM or the retrieval pipeline.
Journey Context:
Naive agents either always query the vector DB \(adding massive latency and cost for simple 'hello' interactions\) or never query it \(relying solely on the LLM's parametric memory, leading to hallucinations\). A lightweight, fast classifier \(or a structured LLM prompt with a 'no retrieval' option\) acts as a gatekeeper. It routes 'What is my password?' to the memory store, but handles 'Summarize this text' purely in working memory.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T11:07:06.487874+00:00— report_created — created