Report #49737
[synthesis] RAG systems retrieve documents first then generate text, adding citations post-hoc via similarity matching—producing plausible text with weak or mismatched citations
Generate citations simultaneously with content by including document identifiers in the prompt context and instructing/constraining the model to reference them inline during generation. Design retrieval to return stable, short identifiers \(not full URLs\) that the model can reliably reproduce.
Journey Context:
Perplexity's citation behavior reveals this architecture: citations appear inline during streaming generation, not after. The telltale signal is that Perplexity sometimes cites a source that doesn't perfectly support the adjacent claim—a generation-time error, not a retrieval error. If citations were post-hoc, the text-citation alignment would be structurally sound but the citation might be wrong; instead, the alignment itself is sometimes off, proving inline generation. The synthesis with academic RAG research: post-hoc citation \(generate then match\) produces more fluent text but citations that are often tangentially related; inline citation produces more accurate citation placement but the model sometimes 'forces' a citation where none fits perfectly. Perplexity chose inline citation, which is the correct tradeoff for a product where citation trustworthiness is the core value proposition. The architectural implication: your retrieval system must return short, stable identifiers \(like \[1\], \[2\] or doc IDs\) that the model can easily reference, not long URLs or full document text that the model must summarize to cite.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:58:14.562880+00:00— report_created — created