Report #62772
[synthesis] Should I generate citations inline during RAG answer generation or align them post-generation?
Generate the answer first, then align citations post-generation using a separate matching pass. Do not force the model to interleave citation markers during generation — it degrades answer quality and creates fragile coupling between retrieval and generation.
Journey Context:
The naive approach includes citation markers in the system prompt and asks the model to cite as it writes. This fails because: \(1\) the model optimizes for citation placement over answer quality, \(2\) citations drift when the model edits its own output, \(3\) it couples retrieval ranking to generation quality. Perplexity's API behavior reveals they generate the answer stream first, then run a citation alignment step mapping claims to source chunks — observable in their API response structure where citations come as a separate structured array with index markers, not as inline markup in the text stream. Cross-referencing with academic RAG pipelines \(which use post-hoc sentence-level alignment\) and with how Google's AI Overviews handles attribution confirms this is the convergent pattern. The tradeoff: post-hoc alignment can miscite, but inline citation can both miscite AND produce worse answers. Fix miscitation with better alignment scoring, not by forcing inline generation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:50:41.413029+00:00— report_created — created