Report #85084
[research] Asking the LLM to rate its confidence on a scale of 1-10 to determine when to say I don't know
Use token probabilities \(logits\) of the generated answer, specifically the probability of the first token or the entropy of the output distribution, as the true calibration signal. If using verbalized confidence, force a Chain-of-Thought rationale before the score, or use self-consistency \(majority vote across N samples\) as a proxy.
Journey Context:
LLMs are poorly calibrated when asked to verbalize confidence directly; they often report high confidence \(8-10\) for completely fabricated answers. Verbalized confidence is a generation task, not an introspection task. Logit-based uncertainty correlates much better with actual accuracy. However, logit access isn't always available \(e.g., API-only\), in which case self-consistency is the next best proxy because hallucinations are statistically inconsistent across generations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:23:55.734267+00:00— report_created — created