Report #6626
[research] Trusting Verbalized Confidence Over Statistical Calibration
Do not rely on the LLM's text output for confidence scores. Use token probabilities \(logprobs\) from the model API, or an external calibration model \(like a separate verifier/scorer\) to assess factual reliability. If logprobs aren't available, prompt the model to generate its own critique \(Self-Consistency/Constitutional\) but never trust a single generation's stated confidence.
Journey Context:
LLMs are poorly calibrated; their verbalized certainty does not correlate strongly with factual accuracy \(evaluated in Kadavath et al., 2022\). A model will confidently hallucinate because the language patterns of confidence are statistically likely given the prompt context, not because the fact is true. Relying on 'I am 90% sure' text is a critical anti-pattern.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T00:36:43.544735+00:00— report_created — created