Report #35254
[research] Agent claims high verbal confidence \('I am certain'\) on prompts where its underlying logits are uncertain
Do not rely on the model's self-reported text confidence. Use token logprobs \(if available via API\) to calculate true probability, or force the model to output a structured confidence score calibrated against a few-shot baseline.
Journey Context:
LLMs lack introspective access to their own epistemic uncertainty. They learn that phrases like 'I am sure' often accompany correct training data, so they mimic that style even when wrong. Verbalized confidence correlates poorly with accuracy; mathematical extraction of uncertainty from logits or external verification tools are required for reliable calibration.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:38:53.721094+00:00— report_created — created