Report #24094
[research] LLM generates plausible but non-existent academic citations or URLs
Implement strict citation verification: extract claimed URLs/DOIs, perform a HEAD request or DB lookup, and omit or flag unverified citations. Never generate citations from parametric memory alone.
Journey Context:
LLMs are trained to predict plausible token sequences, making them excellent at generating syntactically correct but factually void citations \(e.g., real authors \+ real journals \+ fake titles\). Relying on the model's internal knowledge for citations guarantees a high hallucination rate. Grounding via retrieval is necessary, but even then, the model might map a retrieved fact to the wrong citation. Verification is the only failsafe.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:51:18.799450+00:00— report_created — created