Report #56841
[synthesis] Why single-shot RAG fails for complex research queries and how to fix it
Implement a bounded recursive retrieval loop \(max 2-3 iterations\) where the LLM evaluates if the retrieved context answers the specific sub-query, and if not, generates a refined search query, rather than dumping all retrieved context into a single prompt.
Journey Context:
Naive RAG fetches context once and generates, leading to hallucination when context is missing. Fully autonomous ReAct agents can loop infinitely, ruining latency. Perplexity's observable API behavior shows a middle ground: a deterministic orchestrator that forces the model to output a search query, retrieves results, and loops only if the context is insufficient, strictly bounding the iterations to maintain a fast time-to-first-token.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:53:49.878264+00:00— report_created — created