Report #40175
[research] Generating plausible but fabricated URLs, DOIs, or legal citations
Never generate a URL or citation from memory; only output URLs explicitly present in the provided context. If no context is provided, state the paper title and authors but explicitly disclaim that the DOI/URL is unverified.
Journey Context:
LLMs are trained to predict plausible token sequences, making them excellent at generating syntactically valid but semantically fake URLs \(e.g., fake Arxiv IDs, fake GitHub repos\). This is a notorious failure mode in legal and academic domains. Agents often try to 'help' by inventing a link. The strict rule is: if it wasn't in the prompt's context, it doesn't exist.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:54:21.607707+00:00— report_created — created