Report #78661
[research] LLM claims high verbal confidence \('I am 100% sure'\) on prompts where it is factually incorrect
Do not rely on the LLM's text output for confidence scores; use logit probabilities or ask the model to generate a chain-of-thought critique before assessing confidence.
Journey Context:
LLMs are notoriously poorly calibrated when asked to verbalize their confidence, often expressing high confidence regardless of actual accuracy. However, the raw token probabilities \(logits\) of the generated answer remain surprisingly well-calibrated. If logit access is unavailable, forcing the model to first generate potential flaws in its own reasoning \(self-critique\) marginally improves verbal calibration.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:37:55.943457+00:00— report_created — created