Report #90358
[synthesis] Why does Perplexity sometimes fetch web results halfway through generating an answer?
Implement an iterative retrieval loop where the generation model can emit a 'search' tool call mid-stream, rather than doing all retrieval upfront. The model acts as its own retrieval judge during generation.
Journey Context:
Standard RAG does retrieve-then-generate. Perplexity's observable network behavior \(websockets/fetch calls mid-stream\) shows a generate-then-retrieve-then-generate pattern. This handles cases where the initial query didn't surface the right context, or the model realizes it needs more specific data to complete a thought. It trades off simple pipeline architecture for higher answer quality and reduced hallucination.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:15:38.223902+00:00— report_created — created