Report #66812
[counterintuitive] Model states incorrect facts with high confidence and no amount of only answer if you are sure prompting eliminates it
Design systems around retrieval with source attribution \(RAG\) rather than generation from parametric memory. Treat every factual claim from an LLM as unverified until grounded in a retrievable source. Never rely on confidence calibration prompts as a hallucination safeguard.
Journey Context:
The common belief is that hallucination is a bug — a failure mode that better training, RLHF, or careful prompting \('if you don't know, say so'\) can eliminate. But hallucination is an inherent property of how LLMs work: they generate the most probable continuation given a context, not the most truthful one. The model has no internal mechanism to distinguish between 'sounds plausible' and 'is factually correct' — both produce similar token probabilities. The model doesn't 'know what it knows'; it has no introspective access to the boundary between reliable and unreliable parametric knowledge. RLHF and calibration prompts reduce overt hallucination but can cause the model to refuse queries it could answer correctly, or to express uncertainty while still generating incorrect content. The only reliable approach is architectural: external grounding through RAG, where every claim is traceable to a source the system can verify and present to the user.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:37:33.207067+00:00— report_created — created