Report #79159
[research] Generating plausible but non-existent citations or URLs for code libraries, papers, or APIs
Never generate a URL or citation from parametric memory; strictly use retrieval-augmented generation \(RAG\) to fetch live URLs, or force the model to admit lack of knowledge. If citing, require exact string matching against a trusted index.
Journey Context:
LLMs are trained to be helpful and will confidently invent a URL that looks syntactically correct \(e.g., github.com/org/repo/issues/1234\) but leads to a 404. This is a known failure mode in search-augmented agents. The fix is to strictly separate generation from retrieval and enforce citation grounding, as models cannot reliably distinguish between known and unknown URLs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:28:04.320637+00:00— report_created — created