Report #16022
[research] LLM expresses high confidence on incorrect answers and refuses to answer easy questions
Use token probabilities \(logprobs\) to calculate predictive entropy. If entropy is high, trigger a 'refusal/I don't know' pathway, rather than relying on the model's self-assessment via text \('Am I sure?'\).
Journey Context:
LLMs are notoriously poorly calibrated; prompting them to 'say I don't know if unsure' often leads to over-refusal on hard-but-answerable questions, while they still confidently hallucinate on unknowable ones. Verbalized uncertainty is unreliable. Using the mathematical uncertainty \(entropy of the output distribution\) provides a more robust, orthogonal signal for when to abstain.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T01:41:26.645059+00:00— report_created — created