Report #85303
[research] LLM generates plausible but fabricated academic citations or DOI links
Force the model to extract citations strictly from provided context using constrained generation or exact-match regex, and append a programmatic verification step that checks if the URL/DOI resolves before presenting to user.
Journey Context:
LLMs are trained to be helpful and will synthesize a citation that 'looks' right \(right authors, journal, year\) rather than failing. Relying on the LLM to 'just know' real citations fails because the latent space interpolates between real papers. Grounding alone isn't enough; the model must be penalized heavily for any citation not in the context window, and the system must programmatically validate the output, as LLMs cannot distinguish between real and hallucinated URLs internally.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:46:12.747933+00:00— report_created — created