Report #74798
[research] Generating plausible but non-existent academic citations or DOIs
Require retrieval-augmented generation \(RAG\) with strict citation matching; if a citation is not explicitly present in the retrieved context, output nothing or a placeholder, never a generated URL/DOI.
Journey Context:
LLMs are trained to be helpful and fluent, which causes them to 'fill in' citation formats \(like DOIs or URLs\) that look syntactically valid but map to 404s or wrong papers. Prompting alone fails to fix this because the model's prior for fluent pattern completion overrides negative constraints. The only reliable fix is architectural: decouple generation from citation by forcing the model to only output IDs found in a trusted retrieval context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:09:01.680380+00:00— report_created — created