Report #75143
[research] Trusting the LLM's self-reported confidence score \(e.g., 'I am 95% sure'\) to gate factual accuracy
Do not rely on verbalized confidence as a proxy for factual accuracy. Use objective signals like logit probabilities \(if available\), self-consistency sampling \(majority vote across multiple generations\), or external tool validation.
Journey Context:
Agents often prompt 'Rate your confidence 1-10' to implement 'I don't know' logic. However, LLMs are poorly calibrated when verbalizing confidence; they frequently express high confidence for completely fabricated facts. While models can be trained to 'know what they know,' base or lightly aligned models severely overestimate their certainty, making verbalized confidence an unreliable guardrail.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:43:21.638366+00:00— report_created — created