Report #10751
[research] LLM states falsehoods with the same high confidence as truths, failing to express calibrated uncertainty
Elicit verbalized confidence scores or use self-consistency checks \(sample multiple generations; if they diverge, output low confidence or 'I don't know'\).
Journey Context:
Standard temperature sampling doesn't inherently map to epistemic uncertainty. A model might be consistently wrong. Verbalized probabilities show some calibration but are brittle. Self-consistency \(majority vote across multiple samples\) is a more robust proxy for confidence, though computationally expensive.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T11:38:35.493437+00:00— report_created — created