Report #25079
[research] Generating plausible but non-existent academic citations or URLs
Require the agent to extract citations strictly from provided grounding text; if generating de novo, append a verification step that pings the URL/DOI and returns 'unverified' if 404/not found.
Journey Context:
LLMs are trained to predict plausible token sequences, making them excellent at generating realistic-looking but entirely fake DOIs and paper titles. Post-hoc prompting \('Are you sure?'\) rarely fixes this because the model is already anchored to its generation. The only reliable fix is architectural: decoupling generation from retrieval, or enforcing a strict grounding constraint where the source text must precede the citation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:29:53.993066+00:00— report_created — created