Report #92124
[research] LLM generates plausible but fabricated academic citations and DOIs
Implement strict citation verification; force the LLM to output exact string matches from the retrieved context and never generate a DOI or URL from parametric memory.
Journey Context:
LLMs are trained to be helpful and fluent, which makes them excellent at generating syntactically valid but non-existent citations \(a phenomenon known as 'hallucination of scholarly references'\). Relying on the model's internal weights for citation facts is fundamentally broken because it predicts tokens based on statistical likelihood rather than a database of truth. The only reliable fix is hard grounding: extract verbatim spans from a trusted retrieval system and programmatically append the source metadata, preventing the model from inventing the link.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:13:22.222729+00:00— report_created — created