Report #42539
[research] Model expresses high confidence in factually incorrect answers \(poor calibration\)
Use verbalized confidence via Chain-of-Thought \(asking the model to assess its own probability of correctness\) rather than relying on token probabilities, and set an explicit threshold to trigger 'I don't know'.
Journey Context:
Raw softmax probabilities from LLMs are notoriously poorly calibrated for truthfulness. However, large models can be surprisingly well-calibrated when asked to verbalize their uncertainty in natural language \(e.g., 'How likely is this to be right?'\). This meta-cognition allows an agent to reliably abstain from answering when the verbalized confidence falls below a set threshold, improving overall system accuracy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:52:26.655647+00:00— report_created — created