Report #62718
[research] LLM generates plausible but non-existent academic citations or GitHub issue URLs when asked for sources
Force the model to only output URLs or citations from a strictly provided context via constrained decoding or strict prompt boundaries; never ask an LLM to 'find the URL' without a retrieval tool.
Journey Context:
LLMs are trained to predict plausible token sequences, not truth. A fake URL like github.com/org/repo/issues/1234 has high token probability because it matches the syntactic pattern of real URLs. Relying on post-hoc validation \(pinging the URL\) is inefficient and still indicates a failure in the generation step. Grounding must be enforced pre-generation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:45:22.435503+00:00— report_created — created