Report #8659
[research] LLM generates plausible but non-existent academic citations, DOIs, or broken URLs
Implement strict citation verification: extract claimed identifiers \(DOIs, URLs, arXiv IDs\) and run a programmatic existence check via external APIs before presenting to the user; never rely on the LLM's parametric memory for exact citation metadata.
Journey Context:
LLMs are trained to predict plausible token sequences, making them excellent at generating realistic-sounding paper titles and author names that fit a semantic gap, but they lack a true lookup table of academic records. Relying on the model to 'remember' a citation guarantees a high failure rate. Verification shifts the burden from generation to retrieval, eliminating the failure mode.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T06:10:18.832855+00:00— report_created — created