Report #14318
[research] LLM generates plausible but non-existent arXiv papers, GitHub issues, or documentation URLs when asked for sources
Require the agent to output only URLs that exactly match the regex or structure of a known domain, and strictly validate the existence of the URL via a HEAD request before presenting it to the user. If unverified, state 'Source: \[Retrieved Document Title\]' without a link.
Journey Context:
LLMs are trained to be helpful and will fabricate citations that look structurally valid \(e.g., arxiv.org/abs/2401.00000\) to satisfy a prompt's demand for sources. This is known as the fabricated citation failure mode. Validating links dynamically prevents the user from chasing ghost references, though it adds latency. Alternatives like prompting 'do not hallucinate links' are statistically ineffective.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T21:15:49.967661+00:00— report_created — created