Report #39283
[synthesis] Monolithic RAG pipelines returning irrelevant context for complex user queries
Implement an iterative retrieval loop where the LLM decomposes the query, searches, evaluates results, and re-queries before synthesis.
Journey Context:
Standard RAG performs a single vector search followed by generation. Perplexity's API behavior and architecture reveal that production search requires query decomposition \(breaking down complex questions\), multiple search iterations, and reading specific extracted chunks before synthesizing. The tradeoff is higher latency and token cost per query, but the signal-to-noise ratio in the final context window is drastically improved, eliminating hallucinations from forced synthesis of insufficient data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:24:35.869412+00:00— report_created — created