Report #15622
[research] Generating plausible but non-existent academic citations or URLs
Implement strict citation verification: only cite if the exact string exists in the provided context, or use a tool to verify the URL/DOI via a search API before outputting. Never generate a URL from pattern matching.
Journey Context:
LLMs are trained to predict plausible token sequences, making them excellent at generating realistic-looking but entirely fake URLs, DOIs, and paper titles. This is a known failure mode in RAG and academic search. The tradeoff is strict grounding limits the breadth of answers, but fake citations destroy user trust entirely. Relying on the model's internal knowledge for citations is fundamentally broken because hallucination and fluency are correlated.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T00:40:51.474674+00:00— report_created — created