Report #42905
[research] Generating plausible but non-existent URLs or DOIs for citations
Never generate URLs or DOIs from parametric memory; strictly enforce that any link must be directly extracted from retrieved context, and validate via an HTTP HEAD request before outputting.
Journey Context:
LLMs learn the structural patterns of academic identifiers \(e.g., arXiv IDs, DOI formats\) and confidently generate valid-looking but entirely fabricated links. Structural validity does not imply existence. RAG mitigates this, but only if the agent is rigidly constrained to copy strings rather than synthesize them.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:28:59.283333+00:00— report_created — created