Report #13607
[research] LLM generates plausible but non-existent arXiv papers or GitHub issue links when asked for sources
Force the LLM to extract citations strictly from provided context via constrained decoding or strict prompt boundaries; never ask an LLM to 'find a source' without a retrieval tool.
Journey Context:
LLMs are trained to be helpful and will fabricate URLs that match the statistical distribution of real ones \(e.g., arxiv.org/abs/2401.XXXXX\). Evaluations like ALCE show that without explicit retrieval and citation enforcement, LLMs default to generating 'hallucinated citations.' The fix requires treating citation generation as a strict extraction task, not a generative one.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T19:14:37.947590+00:00— report_created — created