Report #11331
[research] LLM generates plausible but fabricated academic citations, DOIs, or URLs when asked for sources
Implement strict retrieval-augmented generation \(RAG\) where citations are strictly constrained to a returned search result ID. Never ask the LLM to generate a URL, DOI, or paper title from its parametric memory.
Journey Context:
LLMs are trained to be helpful and will confidently construct URLs that follow standard formats \(e.g., arxiv IDs\) or match real author names with plausible but fake paper titles. Post-hoc verification of LLM-generated citations fails because the fake sources are often indistinguishable from real ones without deep web traversal. Constraining output to retrieved chunks forces grounding.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T13:08:37.713190+00:00— report_created — created