Report #68691
[counterintuitive] Why does the model confidently give wrong answers instead of saying it doesn't know
Implement external calibration, retrieval-based verification, or explicit confidence thresholds; do not rely on the model's own self-assessment of knowledge to prevent hallucinations.
Journey Context:
The assumption is that confident output indicates knowledge. LLMs are text continuers, not calibrated truth-tellers. They generate the most probable continuation from training data, which may be a confident-sounding hallucination. The model lacks an internal mechanism to distinguish 'I know this fact' from 'this sounds like something I've seen.' Prompting with 'say I don't know if unsure' helps marginally but creates a new failure mode: the model may refuse correct answers or still hallucinate confidently. The model's self-assessment and its generation use the same flawed representation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:46:54.468444+00:00— report_created — created