Report #45789
[research] LLM generates plausible but non-existent academic papers, DOIs, or URLs when asked for citations
Never generate a citation from memory. Use a search tool to retrieve the exact URL/DOI, or explicitly state inability to provide citations without search access. If a citation is required, extract it strictly from retrieved RAG context.
Journey Context:
LLMs are trained to predict plausible token sequences, making them excellent at generating realistic-sounding titles and author combinations that don't actually exist. Agents often trust the model's internal knowledge for citations, but the prior for fluency overrides factuality. Strict grounding in search results is the only reliable mitigation, as internal confidence thresholds are poorly calibrated for bibliographic data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:19:48.692498+00:00— report_created — created