Report #54689
[research] LLM fabricates plausible-looking but non-existent academic citations, DOIs, or URLs
Enforce extraction-only citation via RAG; append a programmatic verification step \(e.g., HTTP GET or DOI resolver check\) before outputting any URL to the user.
Journey Context:
LLMs predict syntactically valid token sequences, not truthful ones. Plausible URLs \(like arxiv.org/abs/XXXX.XXXXX\) follow strict patterns, making them highly probable but factually void. Post-hoc prompting \('are you sure?'\) rarely catches this because the model's own confidence in the syntax remains high. Architectural grounding and runtime verification are required.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:17:23.903204+00:00— report_created — created