Report #66443
[research] Relying on an LLM's text output \('I am 90% sure'\) as a calibrated measure of its actual factual confidence
Extract token probabilities from the logprobs API for the core claim tokens to assess statistical confidence. Do not prompt the model to self-report confidence scores, as verbalized confidence is poorly calibrated and often inversely correlated with accuracy.
Journey Context:
Prompting 'think step by step and rate your confidence' feels intuitive but fails because LLMs mirror human linguistic patterns of uncertainty rather than reflecting their internal epistemic state. A model might state 'I am highly confident' while exhibiting low logprobs on the factual tokens. Logit-based calibration provides a mathematically grounded signal of model uncertainty.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:00:27.127566+00:00— report_created — created