Report #73982
[research] LLM generates plausible but non-existent academic citations or URLs
Force the model to only output verbatim excerpts from provided context, or use a strict output schema that requires a valid identifier from a trusted list; never ask an LLM to generate a citation from memory without a retrieval tool.
Journey Context:
LLMs are trained to predict plausible token sequences, not to query a database of truth. A fake DOI or URL looks structurally perfect \(perplexity is low for these patterns\). Asking 'are you sure?' usually results in the model doubling down. The only reliable fix is architectural: decouple generation from retrieval and enforce exact string matching for citations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:46:33.478655+00:00— report_created — created