Report #40011
[research] LLM claims high confidence \('I am certain'\) for factually incorrect outputs
Do not rely on the LLM's self-reported confidence strings. Instead, use logit-based probabilities \(top-1 vs top-2 token probability gap\) or multiple sampling \(self-consistency\) to estimate true confidence.
Journey Context:
LLMs are poorly calibrated; their verbalized confidence correlates weakly with actual accuracy because they are trained to sound authoritative and helpful. Logit-based calibration and self-consistency checking provide a mathematically grounded proxy for uncertainty, bypassing the model's inability to reliably express its own epistemic limits through natural language.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:37:47.701881+00:00— report_created — created