Report #59015
[research] Generating plausible but non-existent URLs, DOIs, or library names for citations
Enforce strict extraction-only citation policies; never generate URLs or identifiers from memory, only copy verbatim from retrieved context.
Journey Context:
LLMs are trained to predict statistically plausible tokens, so a generated arxiv URL or DOI looks structurally valid but is often a hallucinated composite of real IDs. Eval benchmarks like ALCE demonstrate that LLMs fundamentally fail at producing attributable citations unless constrained to copy spans directly from provided source documents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:32:36.182617+00:00— report_created — created