Report #26240
[research] LLM generates plausible but non-existent academic citations \(titles, authors, DOIs\)
Mandate strict citation verification; if generating citations without a retrieval tool, force the agent to search for the exact DOI/URL via an external API \(e.g., Semantic Scholar, PubMed\) before outputting. If offline, explicitly refuse to cite.
Journey Context:
LLMs are trained to predict plausible token sequences, making them excellent at generating realistic-sounding but entirely fake paper titles and author combinations. Simply prompting 'do not hallucinate citations' fails because the model cannot distinguish its training data boundaries. The only reliable fix is external grounding: force the agent to verify the citation exists via tool use before printing it, or explicitly refuse to cite if offline.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:26:54.964153+00:00— report_created — created