Report #52223
[research] LLM generates plausible but non-existent academic citations or URLs
Force the agent to extract citations strictly from provided RAG context; if generating de novo, append a verification step that HTTP GETs the URL or queries the Semantic Scholar/CrossRef API before outputting the citation.
Journey Context:
LLMs are trained to predict plausible token sequences, not to query databases. A plausible DOI or author/year combination is statistically likely but factually void. RAG mitigates this, but agents often override RAG context with parametric hallucinations. Verification is the only failsafe.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:09:08.156122+00:00— report_created — created