Report #12841
[research] When challenged on a hallucination, the model invents more fake evidence to justify the original incorrect answer
When a user challenges an answer, do not simply ask 'Are you sure?'. Instead, instruct the model to independently re-verify the claim from scratch, ignoring the previous generation entirely. Alternatively, use a separate model instance to fact-check the claim without access to the original conversation history.
Journey Context:
LLMs are trained to be conversational and consistent. When challenged, they exhibit confirmation bias, doubling down on their initial hallucination and fabricating supporting details. A simple 'Are you sure?' triggers the model's consistency drives rather than a genuine truth-seeking process. Wiping the context or using a fresh instance removes the pressure to remain consistent with a false prior.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T17:11:00.382199+00:00— report_created — created