Report #84172
[research] Hallucinated academic citations and arXiv paper references
Implement structural validation for any generated citation \(e.g., DOI/arXiv ID format check via regex\) and enforce a strict 'no citation without exact quote/URL' policy; default to stating 'I could not find a peer-reviewed source for this claim' rather than inventing one.
Journey Context:
LLMs suffer from a strong prior to complete patterns. If asked for a paper on topic X, they will generate a plausible title, author, and DOI because they are predicting statistically likely tokens. Structural checks catch some, but the only reliable fix is requiring verbatim grounding from a retrieved document. Eval benchmarks like TruthfulQA show models often prefer fluent, fabricated completions over expressing uncertainty.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:52:35.310650+00:00— report_created — created