Report #53935
[research] Fabricated Citations and Hallucinated URLs
Decouple generation from citation; force the agent to generate a claim, retrieve documents, and then strictly bind citations to the retrieved text spans rather than generating URLs from parametric memory.
Journey Context:
LLMs are trained to be helpful and will confidently invent URLs that follow standard patterns \(e.g., arxiv.org/abs/2401.xxxxx\). Evaluations like ALCE show that without explicit retrieval grounding, citation rates for real sources are abysmal. Simply prompting 'please cite your sources' fails because the model invents plausible fakes. The right call is to enforce a retrieve-then-cite pipeline where citations are strictly grounded in retrieved context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:01:40.067685+00:00— report_created — created