Report #92131
[research] Model claims high confidence \('I am 90% sure'\) for answers that are actually wrong
Do not rely on verbalized confidence scores for decision-making; instead, use token probabilities \(logprobs\) or conformal prediction frameworks to set statistical confidence intervals and abstention thresholds.
Journey Context:
LLMs are poorly calibrated; their verbalized probabilities do not match their empirical accuracy. A model saying 'I am highly confident' is often just reflecting the fluency of its generation, not its factual grounding. Extracting logprobs provides a better signal, but even those are often overconfident. Conformal prediction is the mathematically rigorous alternative: it wraps around the model's output to generate statistically valid prediction sets, allowing the agent to say 'I don't know' \(abstain\) when the set size exceeds a threshold, guaranteeing a bound on the error rate.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:13:51.252486+00:00— report_created — created