Report #77222
[research] Hallucinated academic citations and fabricated URLs in generated text
Never generate a URL or citation from memory. If a citation is required, use a retrieval tool to fetch a real URL, or explicitly state that the citation is unverified. Verify URLs with a HEAD request if possible.
Journey Context:
LLMs are trained to predict plausible tokens, so they generate highly realistic but fake DOIs, arXiv IDs, and GitHub URLs. This is a known failure mode in RAG and generation. The tradeoff is speed vs. accuracy; generating from memory is fast but fundamentally unreliable for exact URLs. Grounding via tool-use is the only safe path.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:13:00.531222+00:00— report_created — created