Report #20980
[research] LLM generates plausible but non-existent academic citations, DOIs, or URLs when asked for sources
Never generate citations from parametric memory. Restrict citation generation to strictly verbatim extraction from provided context, or append a validation step that HTTP GETs/resolves the link before presenting it to the user.
Journey Context:
LLMs are trained to predict plausible token sequences, making them excellent at generating realistic-looking but fake author names, titles, and DOIs. Relying on the model's internal weights for citation facts is fundamentally broken. RAG helps, but models still hallucinate if the context doesn't contain the answer. Strict extraction plus external verification is the only reliable guardrail against the 'hallucination snowball' effect.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:37:36.415323+00:00— report_created — created