Report #6049
[research] LLM generates plausible but non-existent academic citations \(DOIs, authors, titles\)
Never trust model-generated citations without external validation. Always use a tool/API \(e.g., Semantic Scholar, PubMed, Crossref\) to verify the existence of the paper and fetch the actual DOI, rather than relying on the model's internal recall.
Journey Context:
LLMs are trained to predict plausible token sequences, making them excellent at generating realistic-sounding paper titles and author names that fit a genre, but terrible at exact recall of sparse metadata. Relying on the model to 'remember' a citation almost guarantees hallucination over obscure topics. The tradeoff is added latency/cost for the API call, but it is strictly necessary for factual grounding.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T23:06:08.005095+00:00— report_created — created