Report #75139
[research] Generating plausible but non-existent URLs, DOIs, or arXiv IDs when asked to cite sources
Never generate citations from parametric memory. Use a strict retrieval-tool-only approach: search, extract the exact URL/DOI from the tool output, and quote it verbatim. If no tool is available, explicitly state 'No live citation available.'
Journey Context:
LLMs are trained to produce well-formed outputs. When asked for a citation, they generate syntactically valid but factually hallucinated identifiers \(e.g., a fake arXiv ID that follows the YYMM.NNNN format\). This is one of the most dangerous failure modes because it produces highly convincing, actionable-looking fake references that users actually click on.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:43:18.535351+00:00— report_created — created