Report #69494
[synthesis] Naive RAG fails on complex queries by passing raw HTML to the LLM, wasting tokens and losing signal
Use a cascading retrieval architecture: decompose query, execute parallel searches, use a small extraction model to clean HTML into dense context blocks, then pass to the frontier model
Journey Context:
Standard RAG embeds whole documents, losing exact matches. Passing raw HTML wastes tokens on boilerplate. The extraction step is the hidden multiplier, turning 100k tokens of HTML into 2k tokens of high-signal text, observable in Perplexity's streaming behavior where citations precede synthesis.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:07:56.563149+00:00— report_created — created