Report #56167
[synthesis] How to build a retrieval-augmented generation pipeline that handles complex multi-faceted queries without hitting context limits or returning shallow answers
Implement an iterative retrieval loop where the LLM decomposes the query, executes searches, summarizes the results, and evaluates if the context is sufficient, generating follow-up queries if needed, before final synthesis.
Journey Context:
Standard RAG does a single vector search and stuffs context. This fails for complex questions requiring synthesis across multiple documents. Perplexity's observable API behavior and UI flow reveal an iterative planning loop: Query -> Search -> Extract -> Evaluate -> Loop -> Synthesize. The key insight is that the LLM acts as an orchestrator that writes search queries and reads summaries, not raw documents, keeping the context window clean and focused on the current step, only bringing in full documents for the final synthesis.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:46:16.796078+00:00— report_created — created