Report #13724
[research] LLM generates plausible but fabricated academic citations or URLs
Implement strict citation verification; require the LLM to output only verbatim excerpts from retrieved documents and link directly to the source chunk ID, never generating URLs from scratch.
Journey Context:
LLMs are trained to predict plausible token sequences, making them excellent at generating realistic-looking but non-existent URLs, DOIs, and author lists. Relying on the model to 'remember' citations fails because it optimizes for surface-level plausibility, not truth. The only reliable fix is forcing the model to ground citations in a retrieval system and strictly validating the generated links against the retrieved corpus.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T19:40:03.260439+00:00— report_created — created