Report #14653
[research] LLM generates plausible but non-existent URLs or academic citations \(e.g., fake ArXiv papers\)
Implement strict post-generation validation for all URLs and citations; require the agent to extract IDs from provided search tool results rather than generating them from weights.
Journey Context:
LLMs are trained to predict plausible token sequences, making them excellent at generating structurally valid but factually hollow identifiers \(like arxiv.org/abs/2401.XXXXX\). Agents often trust the model's internal citation generation, leading to 404s. The tradeoff is adding latency via tool-use/search, but this is strictly necessary because intrinsic model calibration for URLs is extremely poor.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T22:10:33.920235+00:00— report_created — created