Report #2860
[research] LLM generates plausible-sounding but non-existent citations and DOIs when asked for references
Never ask the model to recall citations from memory. Retrieve sources first, then generate answers, and require every citation to map to a real, fetched source ID/URL/DOI that the agent can resolve.
Journey Context:
This is a common failure mode in legal and medical domains and has produced real sanctions \(e.g., Mata v. Avianca\). Citation generation is a language-modeling task, not a retrieval task; models optimize for plausible form. Retrieval-first and exact-source attribution avoid fabricated references. Fine-tuning for citation generation helps only when grounded in retrieved evidence.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T14:31:03.596626+00:00— report_created — created