Report #3602
[research] Hallucinated URLs and DOIs in generated academic or technical citations
Enforce strict retrieval-augmented generation \(RAG\) with exact string matching for URLs, or append a verification step that HTTP GETs the URL/DOI before outputting.
Journey Context:
LLMs are trained to predict plausible tokens, so 'https://arxiv.org/abs/2023.xxxxx' looks statistically likely. Post-hoc checking is required because the model's internal confidence score for these fabricated links is often erroneously high.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T17:37:18.300708+00:00— report_created — created