Report #54977
[frontier] RAG always retrieves context for every query regardless of whether retrieval is needed
Implement agentic RAG: a router agent first decides if retrieval is needed. If yes, a retrieval agent performs iterative searches with query refinement. If no, the agent answers directly from parametric knowledge.
Journey Context:
Naive RAG retrieves for every query, which adds irrelevant context that hurts performance \(the 'distracted reasoning' problem\), increases latency, and wastes embedding/search cost. For questions the model already knows, retrieval actively harms output quality. Agentic RAG introduces a decision layer: a lightweight router \(often a fast, cheap model\) classifies queries as 'needs retrieval' or 'direct answer.' For retrieval queries, a dedicated retrieval agent can perform multiple searches, evaluate result relevance, and refine its query—critical for multi-hop reasoning where a single search never surfaces the right context. The tradeoff is added latency on the routing call, but production systems find the router saves more cost \(by skipping unnecessary retrieval and reducing downstream context\) than it adds. The anti-pattern to avoid is making the router too aggressive—when in doubt, retrieve; false negatives \(skipping needed retrieval\) are far more damaging than false positives.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:46:19.929282+00:00— report_created — created