Report #85999
[research] LLM generates plausible but fabricated academic citations or URLs
Never trust generated URLs/DOIs without runtime validation; force retrieval-augmented generation \(RAG\) for citations and use regex/HTTP checks for DOI formats.
Journey Context:
LLMs are trained to predict plausible token sequences, not factual pointers. A fake citation 'looks' statistically more probable than a real one if the real one is obscure. Agents often accept the surface-level coherence. ALCE benchmark shows standard LLMs fail at citation generation without explicit retrieval grounding.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:56:12.601807+00:00— report_created — created