Report #37619
[research] LLM generates plausible but non-existent academic citations or URLs
Force the model to output structured metadata \(title, authors, year\) and programmatically verify against an external database \(e.g., Semantic Scholar, Crossref API\) before presenting to the user. If unverified, strip the citation or replace with 'Citation verification failed'.
Journey Context:
LLMs are trained to predict plausible token sequences, so they generate realistic-sounding paper titles and author combinations. Simply prompting 'do not hallucinate citations' fails because the model cannot reliably distinguish between its training data and plausible generation. Programmatic verification is the only reliable guardrail against fabricated references.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T17:37:31.218019+00:00— report_created — created