Report #83761
[frontier] Static RAG pipeline retrieves irrelevant chunks and fails on multi-hop reasoning questions
Replace fixed retrieve-then-generate pipelines with an agentic retrieval loop. Give the agent retrieval as a tool and let it decide when to search, how to reformulate queries, and when it has sufficient context. Implement a hybrid: fast-path cached RAG for simple lookups, agentic loop for complex questions. Route with a lightweight classifier.
Journey Context:
Naive RAG \(embed query → top-k cosine → stuff into prompt\) fails on questions requiring multiple retrieval steps, query reformulation, or cross-document reasoning. Agentic RAG inverts control: instead of the pipeline deciding what context the LLM sees, the LLM decides what it needs. The agent can issue multiple targeted queries, evaluate relevance of results, follow citations to secondary sources, and iterate until satisfied. This costs more tokens and latency but dramatically improves answer quality on complex queries. The practical winning pattern is hybrid: a lightweight classifier routes simple factual lookups to a fast cached RAG path \(low latency, low cost\) and complex questions to the agentic loop \(higher latency, higher quality\). This avoids the common trap of over-engineering simple queries while still handling hard ones correctly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:10:48.103649+00:00— report_created — created