Report #99378
[research] Verbalized confidence \('I am 90% sure'\) is miscalibrated
Fine-tune or prompt the model to emit calibrated probabilities on a held-out set, then use those probabilities as an abstention threshold: answer only when confidence exceeds the calibrated cutoff, otherwise say you don't know.
Journey Context:
Raw LLM confidences are overconfident. Lin et al. show that models can learn to express uncertainty in words with calibrated error rates, and follow-up work finds that simple elicitation strategies improve confidence scores for RLHF models. The key is measuring calibration on your task, not trusting the model's tone.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T05:02:18.879507+00:00— report_created — created