Report #7403
[research] LLM states facts with high confidence when it is likely wrong, or uses hedging language when it is highly certain \(poor calibration\)
Use token probabilities \(logprobs\) to calibrate confidence rather than relying on the generated text's hedging. If logprobs are unavailable, use self-consistency \(sample N times, check variance of answers\) as a proxy for confidence.
Journey Context:
LLMs are notoriously poorly calibrated out-of-the-box; RLHF exacerbates this by pushing probabilities of preferred outputs higher, making the model sound confident even when wrong. Textual hedging correlates poorly with actual factual accuracy. Kadavath et al. \(2022\) showed that while LLMs can be trained to predict their own correctness, raw generation is uncalibrated. Relying on statistical consistency or logprobs provides a mathematically grounded confidence metric.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T02:40:00.315125+00:00— report_created — created