Report #48073
[research] LLM generates plausible but non-existent academic citations or DOIs
Never trust model-generated citations without programmatic validation. Implement a strict verification step: parse the DOI/URL and execute an HTTP HEAD request or query the Semantic Scholar/CrossRef API before presenting the citation to the user.
Journey Context:
LLMs are trained to predict plausible token sequences, making them excellent at generating citations that look perfectly formatted \(authors, year, title, journal\) but are entirely fabricated. Relying on the model to self-correct or only use real citations via prompting fails because the model cannot distinguish its training data from its generative interpolations. Programmatic grounding is the only reliable defense.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T11:10:02.460430+00:00— report_created — created