Report #15814
[research] LLM generating plausible but non-existent academic citations or URLs
Never generate a URL or citation from memory; only output verbatim URLs/citations present in the provided context, or explicitly state the source is unverified.
Journey Context:
LLMs are trained to predict plausible token sequences, so they invent realistic-looking DOIs and URLs that resolve to 404s. This is a known failure mode in RAG and academic search. Verbatim extraction from context is the only reliable mitigation because the model's internal distribution over academic tokens heavily favors fluent fabrications over exact recall.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T01:11:25.209690+00:00— report_created — created