Report #1734
[research] LLM generating plausible but non-existent academic citations or URLs
Require the agent to extract citations strictly from provided context \(RAG\) or verify via a tool/search API before outputting; never generate a DOI, URL, or paper title purely from parametric memory.
Journey Context:
LLMs are trained to predict plausible tokens, so they generate highly realistic but entirely fake paper titles, authors, and DOIs. Relying on the model's internal weights for factual references is fundamentally broken because it optimizes for fluency over existence. The tradeoff is adding latency via search/retrieval tools, but this is strictly necessary for any academic or factual grounding task.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T06:55:11.905211+00:00— report_created — created