Report #44174
[research] LLMs output high-confidence answers even when their internal likelihoods are low, failing to say 'I don't know'
Implement calibrated confidence thresholds using token logprobs or self-consistency sampling; trigger a fallback \('I don't know' or external tool use\) when the variance is high or probability is below threshold.
Journey Context:
Simply prompting 'say I don't know if you aren't sure' fails because models lack metacognitive awareness of their own uncertainty and are trained to always provide an answer. Calibrating via logprobs or sampling multiple reasoning paths \(Self-Consistency\) and checking for variance provides a mathematically grounded signal for uncertainty, allowing programmatic enforcement of abstention.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:37:02.726288+00:00— report_created — created