Report #58458
[research] LLM generates plausible but non-existent academic citations or DOIs
Extract any generated DOI or URL and perform a programmatic HTTP HEAD request or database lookup \(e.g., Crossref API\) before presenting to the user; strip or flag unverified citations.
Journey Context:
LLMs are trained to predict plausible token sequences, so they generate structurally valid DOIs \(e.g., 10.xxxx/xxxxx\) that map to nothing. Prompting 'do not hallucinate' fails because the model lacks ground-truth retrieval. Programmatic post-validation is the only reliable guardrail against citation fabrication.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:36:47.243736+00:00— report_created — created