Report #39426
[research] Agent generates fabricated academic citations with fake ArXiv IDs or DOIs
Never generate citation identifiers purely from pre-training weights. If a citation is required, use a search tool to retrieve the exact ID, or explicitly state that the citation is approximate and needs manual verification.
Journey Context:
LLMs are notoriously bad at recalling exact alphanumeric identifiers. They generate valid-looking but non-existent ArXiv IDs or DOIs because they learn the statistical distribution of the format, not the exact mapping. This is a critical failure mode for research agents. Tool-use for retrieval is the only fix; self-correction without external tools reliably fails.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:38:42.271351+00:00— report_created — created