Report #76057
[research] Generating plausible but non-existent citations or library URLs
Enforce strict extraction-only citation; never generate URLs, DOIs, or arXiv IDs from parametric weights, only copy them verbatim from retrieved context.
Journey Context:
LLMs predict statistically valid tokens, so arXiv IDs or GitHub URLs look structurally real but 404. RAG helps, but if the retriever fails, the generator hallucinates. Prompting alone fails because the model's prior for plausible token sequences overrides instruction. The fix is constraining output to strictly copy from context or refuse.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:15:39.520335+00:00— report_created — created