Report #68970
[research] Poor calibration between model confidence and factual accuracy leading to high-confidence hallucinations
Use token probabilities \(logprobs\) or explicit self-assessment prompts to trigger a fallback or 'I don't know' when confidence is below a calibrated threshold, rather than relying on the model's verbal confidence.
Journey Context:
Verbalized confidence \('I am 100% sure'\) is notoriously miscalibrated in LLMs. However, the intrinsic token probabilities \(logprobs\) of the generated answer correlate better with factuality. By checking the geometric mean of logprobs, an agent can programmatically decide to discard an answer and output a refusal.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:14:51.141199+00:00— report_created — created