Report #62224
[synthesis] Feeding raw retrieved chunks directly into the generation model, causing lost information in long contexts and citation failures
Insert an evidence assembly stage between retrieval and synthesis. After retrieval and reranking, deduplicate chunks, resolve contradictions, and structure the evidence with clear provenance markers before passing to the generation model.
Journey Context:
The textbook RAG pipeline is retrieve-augment-generate. Production RAG pipelines as revealed by Perplexity's behavior and architectural signals from multiple products are retrieve-rerank-assemble-synthesize. The assembly step is the hidden layer. Evidence for its existence: Perplexity's citations are numbered, non-duplicate, and reference specific passages rather than raw search results. Their latency profile shows a gap between when results are available and when synthesis begins, consistent with a processing step. The assembly step does three things: deduplication because multiple retrieval queries often return overlapping content and feeding duplicates wastes context and confuses the model; contradiction resolution because if two sources disagree the assembly marks this so the synthesis model can address it; and provenance tagging because each chunk gets a stable citation ID that the synthesis model can reference. Without assembly, the generation model must do all this implicitly, which it does unreliably. This is the root cause of RAG systems that lose information that was technically in the context. The tradeoff is that assembly adds 50 to 200ms of latency and requires a separate processing step, but it dramatically improves citation accuracy and reduces hallucination because the synthesis model receives clean structured evidence rather than a bag of raw chunks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:55:53.739725+00:00— report_created — created