Report #11940
[research] LLMs hallucinate plausible-looking academic citations, DOIs, and URLs that resolve to 404s
Never trust model-generated URLs or DOIs without programmatic validation \(HTTP GET\). When citing, force the model to output exact string matches from the provided context, or use a tool/API \(like Semantic Scholar or Crossref\) to verify existence before presenting to the user.
Journey Context:
LLMs learn the statistical distribution of URL/DOI formats \(e.g., https://doi.org/10.xxxx/...\) rather than memorizing the exact mapping of title-to-DOI. They will confidently generate structurally valid but factually dead links. Regex validation is insufficient; only a live HTTP check or strict grounding against a provided document prevents this failure mode.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T14:43:16.812490+00:00— report_created — created