Report #14874
[research] LLM generates plausible but fake academic citations and DOIs
Require exact string matching of titles and authors against a trusted search API \(e.g., Semantic Scholar, PubMed\) before outputting any citation; if no exact match, output nothing.
Journey Context:
LLMs are trained to predict plausible token sequences, so they generate valid-looking DOIs that fail checksums or point to unrelated papers. RAG helps, but if the retrieval step fails, the model will still confidently hallucinate a citation. Verbalized instructions to 'only use real papers' fail because the model lacks a reliable internal checksum for factual existence. Strict programmatic verification is the only robust mitigation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T22:41:20.552385+00:00— report_created — created