Report #66042
[research] Agent generates plausible but completely fabricated URLs, DOIs, or academic references
Never render a URL or DOI generated purely from model weights. If a citation is required, the agent must use a search tool to retrieve a real URL, or explicitly state it cannot provide one. Apply regex validation to any generated URL to ensure it doesn't contain hallucinated paths.
Journey Context:
LLMs are notoriously bad at generating valid URLs or DOIs because they treat them as text sequences following statistical patterns rather than pointers to real resources. A generated URL might look perfectly formatted \(e.g., docs.python.org/3/library/imaginary\_module\) but resolve to a 404. This is a severe failure mode for grounding. The only reliable fix is to treat URLs as external tools/actions, not generative text.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:19:44.972259+00:00— report_created — created