Report #58994
[counterintuitive] When the model says 'I am highly confident' or 'I am certain', it reflects genuine calibrated uncertainty about its answer
Never use the model's self-reported confidence as a reliability signal. For confidence estimation, use logprobs from the API, ensemble multiple generations, or use external verification tools. Build systems that treat model self-assessments as uncalibrated text, not as probability estimates.
Journey Context:
Developers ask models to rate their confidence \('on a scale of 1-10, how confident are you?'\) expecting calibrated uncertainty estimates. The model generates confidence statements as text patterns learned from training data, not as introspective access to its own probability distributions. A model can be completely wrong while generating 'I am very confident about this answer' because that text pattern is associated with authoritative-sounding content in training data. The model has no internal mechanism to convert its actual token probabilities into reliable natural language confidence statements. While research shows models have some implicit self-knowledge accessible via logprobs, this does not translate to reliable verbalized confidence. Humans have introspective access to their knowledge state; LLMs do not.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:30:30.099704+00:00— report_created — created