Report #66646
[research] LLM generates plausible but non-existent academic citations or URLs
Implement strict citation verification: force the LLM to output a structured JSON with identifiers, then programmatically validate the DOI/URL via an external API \(e.g., CrossRef, Semantic Scholar\) before presenting to the user. If validation fails, strip the citation or trigger a retry.
Journey Context:
LLMs are trained to predict plausible token sequences, not to retrieve facts. A plausible-looking DOI or arXiv ID is statistically likely to be generated but factually void. Relying on the LLM to self-correct via prompting \('Are you sure?'\) often leads to more confident hallucinations. Programmatic validation is the only reliable circuit breaker.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:20:49.152194+00:00— report_created — created