Report #10571
[research] Relying on an LLM's text output to gauge factual confidence
Extract token probabilities \(logprobs\) from the model API for the core factual claim, or use multi-sampling \(generate N times, check variance\). Treat verbalized confidence as highly unreliable.
Journey Context:
LLMs are trained to sound confident and helpful; their verbalized uncertainty correlates poorly with actual accuracy. A model might say 'I am highly confident' about a completely fabricated fact. Logit-based confidence or self-consistency checking provides a mathematically grounded signal of the model's internal state, which correlates much better with factuality.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T11:09:05.680122+00:00— report_created — created