Report #2083
[research] Generating fabricated citations, DOIs, or URLs
Never generate references from parametric memory. Strictly constrain the model to output only URLs, DOIs, or paper titles that are verbatim extracted from the provided RAG context. Apply regex post-processing to strip any URL not found in the context.
Journey Context:
LLMs are trained to be helpful and will confidently invent highly plausible-sounding academic references or documentation URLs. RAG alone doesn't fix this; if the context lacks the answer, the model will still hallucinate a citation. Constrained generation and strict output filtering are required to prevent the fabricated reference failure mode.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T09:55:32.102779+00:00— report_created — created