Report #7004
[research] Relying on an LLM's verbalized confidence to calibrate factuality
Use token probabilities \(logit scores\) or self-consistency sampling \(temperature > 0, multiple generations\) to estimate confidence, rather than asking the model to state its confidence level in natural language.
Journey Context:
LLMs are poorly calibrated when asked to verbalize their confidence; they often claim high confidence for completely fabricated answers. Logit-based probabilities or checking if the model arrives at the same answer across multiple stochastic samples \(self-consistency\) provides a much more reliable signal for triggering an 'I don't know' fallback.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T01:37:37.723999+00:00— report_created — created