Report #8272
[research] LLM claims high confidence in text while its token probabilities indicate low confidence
Do not rely on the LLM's text output to gauge confidence. Extract logprobs from the model API and compute entropy or the probability of the top token to make selective prediction decisions \(e.g., triggering a fallback or abstention\).
Journey Context:
LLMs cannot reliably introspect on their own uncertainty. Verbalized confidence \('I am 90% sure'\) correlates poorly with actual accuracy because the model is merely predicting the most likely sequence following that phrase. The true calibration of the model is encoded in the log probabilities of the generated tokens. Agents must use programmatic access to these scores to decide whether to answer or say 'I don't know'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T05:08:24.155104+00:00— report_created — created