Report #83138
[counterintuitive] Using logprobs as reliable confidence scores for LLM answers
Avoid using raw logprobs for calibrated uncertainty estimation; use self-consistency \(sampling multiple reasoning paths\) or fine-tuned probes if confidence scoring is critical.
Journey Context:
It seems intuitive that if a model assigns a 99% probability to a token, it is highly confident in its factual correctness. However, RLHF \(Reinforcement Learning from Human Feedback\) severely distorts the base model's probability distribution, making models chronically overconfident. A model will output high logprobs for hallucinated facts simply because they sound plausible in the RLHF-tuned style, decoupling probability from factual accuracy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:08:20.026278+00:00— report_created — created