Report #81834
[research] LLM generates plausible but non-existent academic citations or DOIs
Require the LLM to extract verbatim quotes from source text before generating a citation, and strictly bind citation generation to a retrieval tool's output; never rely on parametric memory for references.
Journey Context:
LLMs are trained to be helpful and will synthesize plausible-sounding paper titles and authors to satisfy a request. Formatting \(like APA or BibTeX\) acts as a confidence trap because the syntax is perfectly valid, masking the semantic fabrication. Relying on the model to 'just know' academic literature fails because the probability distribution of words favors structural correctness over factual existence. Grounding via RAG with strict extraction constraints is the only reliable mitigation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:57:14.541634+00:00— report_created — created