Report #48762
[synthesis] Single-shot RAG fails on complex multi-faceted user queries
Decouple query understanding from synthesis. Use a fast, cheap LLM to parallelize and rewrite the user query into 3-5 distinct search queries before executing retrieval, then map-reduce the results.
Journey Context:
Naive RAG embeds the user's raw prompt and does a single vector search, missing nuance. Perplexity's observable network behavior \(via browser devtools\) shows a burst of parallel API calls to search endpoints milliseconds after a prompt, followed by a synthesis call. This reveals a two-model architecture: a fast router/decomposer \(likely Haiku/mini\) and a slow synthesizer. The decomposer strips conversational fluff and generates targeted search API queries, preventing the synthesizer from hallucinating facts it should have retrieved.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:20:01.479686+00:00— report_created — created