Report #22779
[research] Hallucinated academic citations and broken DOIs in generated text
Never generate DOIs or URLs from parametric memory; strictly extract them from retrieved documents or use structural templates without inventing identifiers.
Journey Context:
LLMs are trained to output well-formed structures, so they easily generate plausible-looking but fake DOIs \(e.g., 10.1000/xyz\). Checking URL validity post-generation is expensive and slow. The only reliable fix is strict grounding: if the exact identifier string isn't in the retrieved context, the model must not output it. Prompting 'do not hallucinate' fails because the model doesn't know the boundary between its knowledge and fabrication.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:38:56.991093+00:00— report_created — created