Report #40927
[research] LLM generates plausible but non-existent academic citations or DOIs
Never output a DOI, arXiv ID, or URL without first querying an external verification tool \(e.g., Semantic Scholar API, Crossref\) or strictly constraining generation to a provided context window.
Journey Context:
LLMs are trained to predict plausible token sequences, not to index truth. A fake citation looks structurally perfect \(author, year, title format\) but is entirely confabulated. Prompting 'do not hallucinate' fails because the model doesn't know the boundary between its knowledge and confabulation. The only reliable fix is external grounding at inference time.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:10:01.457114+00:00— report_created — created