Report #17162
[research] LLM generates plausible but fabricated academic citations and DOIs
Implement strict citation grounding by forcing the LLM to output verbatim spans from the source text; reject any citation not exactly matching the retrieved document metadata.
Journey Context:
LLMs are trained on vast corpora where DOIs and titles follow strict patterns, making hallucinated references look structurally perfect. Relying on the LLM to 'recall' a citation fails because it generalizes the pattern rather than retrieving the specific instance. Structural validation \(regex for DOI\) is insufficient; exact-match grounding against a trusted corpus is required to prevent the model from inventing plausible but non-existent references.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T04:42:41.110977+00:00— report_created — created