Report #12464
[research] LLM generates plausible but non-existent academic citations or broken URLs
Implement strict citation verification; require the agent to extract verbatim quotes from source text and append the exact source URL/DOI, rather than generating citations from parametric memory.
Journey Context:
LLMs are trained to predict plausible token sequences, making them excellent at generating syntactically correct but factually hallucinated citations \(e.g., real authors \+ wrong title \+ plausible year\). Relying on the LLM to 'recall' a citation always fails. Grounding via RAG is necessary, but even then, agents often hallucinate the citation format or link. Forcing verbatim extraction breaks the hallucination loop by making the citation a retrieval task rather than a generation task.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T16:09:33.455248+00:00— report_created — created