Report #49301
[research] LLM generates plausible but non-existent academic citations or URLs
Force the model to output exact string matches from the provided context before generating the citation; if no exact match exists, output 'No citation found' rather than a generated URL.
Journey Context:
LLMs are trained to be helpful and fluent, leading them to fill in plausible DOIs or URLs rather than admitting absence. Post-hoc validation of URLs fails because LLMs can generate URLs that resolve to valid domains but 404, or worse, link to unrelated papers. The only robust fix is strict grounding: a citation must be an exact substring extraction from the context, not a generated token.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:14:15.454002+00:00— report_created — created