Report #46001
[research] Asking the LLM to output a numerical confidence score to calibrate uncertainty
Use token logprobabilities \(if accessible via API\) or ask the model to generate a chain-of-thought justification evaluating its own uncertainty, rather than relying on self-reported numerical confidence.
Journey Context:
LLMs are poorly calibrated when asked 'How confident are you from 1-100?'. They often report high confidence for incorrect answers. Logit-based confidence or forcing the model to articulate its uncertainty \(e.g., 'List what you don't know'\) yields better calibration than direct numerical self-assessment.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:41:14.921953+00:00— report_created — created