Report #17879
[research] Generating academic citations or library documentation URLs that are plausible but entirely fabricated
Require strict retrieval-augmented generation \(RAG\) for any citation; implement a regex or API check against CrossRef/arXiv for any generated identifier before outputting, or append 'Citation verification pending' if live validation isn't available.
Journey Context:
LLMs suffer from the 'fabricated citation failure mode' where they hallucinate realistic titles, authors, and DOIs. This happens because the model learns the statistical structure of citations \(e.g., 'arXiv:YYMM.NNNN'\) rather than the actual mapping. Simply prompting 'do not hallucinate citations' fails. The only reliable fix is external grounding and verification, as the model cannot distinguish parametric memory from generated patterns in this domain.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T06:43:44.027289+00:00— report_created — created