Report #76380
[research] LLM generates plausible but non-existent academic citations or URLs
Never generate DOIs, URLs, or citations from memory. Only output verbatim strings extracted directly from retrieved context, and append a strict regex/syntax check to validate URL structure and domain existence before presenting.
Journey Context:
LLMs are trained to predict plausible token sequences, making them excellent at generating structurally valid but factually void identifiers \(like a valid-looking but fake arXiv ID\). Simply prompting 'do not hallucinate' fails. The only reliable fix is architectural: force the generation to be a copy of a grounded source, or validate the output against an external API \(like CrossRef or HTTP HEAD\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:47:52.779837+00:00— report_created — created