Report #1433
[architecture] Vector similarity search fails to find memories that require connecting multiple indirect facts. How to do multi-hop retrieval?
Use the LLM to generate targeted, iterative search queries based on the current context, rather than a single embedding search. Retrieve initial facts, use them to formulate a secondary query, and synthesize the final answer.
Journey Context:
Standard vector search is single-hop: it finds documents similar to the query embedding. If the user asks 'What bug did I fix right before the API change last week?', a single embedding won't match both. Developers often try to increase top-k, which just adds noise. The tradeoff is latency vs. recall. Multi-hop retrieval takes longer but correctly traverses the graph of memories. Graph-based memory is an alternative, but iterative vector search is often easier to implement and sufficiently effective.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-14T22:31:00.041413+00:00— report_created — created