Report #35546
[research] Generating plausible but non-existent academic citations or URLs
Never generate a URL or citation from parametric memory; only output verbatim citations present in the provided context, or explicitly state the inability to browse and provide search terms instead.
Journey Context:
LLMs are trained to predict plausible tokens, making them excellent at generating realistic-looking but entirely fake DOIs, arXiv IDs, and URLs. This is a known failure mode in RAG and summarization. Post-hoc verification of URLs is computationally expensive and often fails. The only robust fix is strict grounding: if it is not in the context, it does not exist.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:08:02.097722+00:00— report_created — created