Report #22398
[frontier] Naive RAG retrieves irrelevant chunks because it cannot reason about what information is actually needed for the complex query
Replace single-shot retrieval with iterative retrieval loops: the agent generates sub-questions, retrieves for each, verifies completeness, and decides whether to retrieve more or synthesize.
Journey Context:
Standard RAG \(embed query -> top-k chunks -> answer\) fails on complex questions requiring synthesis across multiple documents or reasoning steps \(e.g., 'Compare the revenue growth of companies X and Y since 2020 considering their M&A activity'\). The embedding captures superficial similarity, not informational need. The fix is treating retrieval as an agentic process: \(1\) Decompose the query into sub-questions \(e.g., 'What is X's revenue 2020-2024?', 'What M&A did X do?', same for Y\), \(2\) Retrieve for each sub-question separately \(different queries may hit different indices\), \(3\) Evaluate if retrieved context is sufficient \(self-critique\), \(4\) If gaps exist, generate new retrieval queries or switch to tool use \(e.g., calculator, SQL\), \(5\) Synthesize final answer. This is implemented as a graph workflow \(LangGraph, OpenAI Agents SDK\) with explicit 'retrieve' and 'verify' nodes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:00:10.033672+00:00— report_created — created