Report #43802
[research] LLM generates plausible but non-existent URLs, DOIs, or academic citations
Never trust model-generated URLs or citations without programmatic validation. Implement a strict verification step \(e.g., HTTP HEAD request for URLs, cross-referencing DOIs via API\) before surfacing them to the user.
Journey Context:
LLMs are trained to predict plausible token sequences, so they generate syntactically valid but factually void identifiers \(e.g., a valid-looking arXiv ID that doesn't exist\). Relying on the model to 'know' if a citation is real is a fundamental category error; the model only knows the pattern of citations. Programmatic grounding is the only reliable fix.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T03:59:37.756784+00:00— report_created — created