Report #3034
[research] Hallucinated URLs and DOIs in generated references
Enforce an extraction-only citation policy: never generate a URL or DOI from model weights, only copy verbatim from retrieved tool outputs.
Journey Context:
LLMs are trained to predict plausible text, causing them to invent URLs that perfectly mimic standard patterns \(e.g., arxiv.org/abs/2401.xxxxx\). Eval benchmarks like HaluEval show extremely high fabrication rates for citations. The tradeoff is losing valid URLs the model memorized, but precision is vastly improved by strictly gating citations through RAG extraction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T14:57:04.518425+00:00— report_created — created