Report #13039
[research] Generating plausible but non-existent URLs or DOIs for citations
Restrict generation to only output verbatim URLs or DOIs explicitly present in the retrieved context, or append a programmatic validation step \(e.g., HTTP HEAD request or DOI resolver check\) before presenting the citation to the user.
Journey Context:
LLMs learn the structural patterns of citations \(e.g., github.com/org/repo/issues/123 or doi.org/10.1000/xyz\) and generate syntactically valid but hallucinated links. Structural validity does not imply existence. RAG grounding alone fails if the model is allowed to paraphrase or synthesize URLs from the retrieved text. Programmatic validation is the only reliable circuit breaker.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T17:40:18.288312+00:00— report_created — created