Report #42019
[research] LLM generates plausible but non-existent academic citations or URLs
Implement strict string-matching validation for any generated URL or citation against an external trusted database \(e.g., PubMed, Semantic Scholar API\) before presenting to the user; never trust the LLM to self-correct by asking 'are you sure?'
Journey Context:
LLMs are trained to predict plausible token sequences, making them excellent at generating syntactically correct but semantically void citations \(fake DOIs, real authors paired with wrong papers\). Asking the model 'are you sure?' often leads to doubling down on the hallucination or generating a new fake one. Grounding must be external and programmatic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:00:15.209631+00:00— report_created — created