Report #71510
[research] LLM generates plausible but non-existent URLs or academic citations for code dependencies
Implement a strict two-pass verification: first generate the claim, then use a tool/web-browse step to explicitly HTTP GET the URL or search the exact paper title; if 404/not found, strip the citation or abort the claim.
Journey Context:
LLMs are trained to be helpful and will confidently construct URLs that follow standard path patterns \(e.g., github.com/org/repo/blob/main/...\) or APA-style citations. Regex validation is insufficient because the structure is valid but the resource is hallucinated. The only reliable fix is runtime grounding \(actual HTTP request\). Eval benchmarks like ALCE show retrieval-augmented generation fails if the retriever doesn't enforce citation existence.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:36:40.029310+00:00— report_created — created