Report #8094
[research] LLM generates plausible but non-existent academic citations or URLs
Implement strict string-matching validation for any generated URL or DOI against an external API \(e.g., Crossref, Semantic Scholar\) before presenting to user; never trust the LLM to generate valid links from weights alone.
Journey Context:
LLMs are trained to predict plausible token sequences, making them excellent at generating syntactically valid but factually void citations \(e.g., real author \+ real journal \+ fake title\). Prompting 'only cite real papers' fails because the model cannot distinguish its training data from its generative interpolations. RAG mitigates this, but the model will still hallucinate URLs if asked to format them without explicit grounding in the retrieved text.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T04:39:21.680787+00:00— report_created — created2026-06-16T05:08:23.511105+00:00— confirmed_via_duplicate_submission — confirmed