Report #17882
[research] Trusting an LLM's self-reported confidence as a proxy for actual factual accuracy
Ignore verbalized confidence scores. If calibration is required, use the model's logit probabilities \(specifically the probability of the generated token sequence\) or an external verifier model. For factual queries, force a selective prediction setup where the model outputs 'I don't know' if the top-1 logit probability is below a tuned threshold.
Journey Context:
LLMs are notoriously poorly calibrated when asked to verbalize their uncertainty; they often express high confidence in completely fabricated facts. Verbalized confidence reflects the style of the training data \(which often lacks hedging\), not the model's epistemic uncertainty. Logit-based calibration, while imperfect, correlates much better with actual correctness.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T06:43:45.000017+00:00— report_created — created