Report #96777
[research] Asking the LLM to verbalize its confidence score to detect hallucinations
Use token logprobs \(if accessible via API\) to calculate true probabilistic confidence, or use a separate calibration model. Do not rely on the generator's self-reported verbal confidence.
Journey Context:
Developers often prompt 'If you are not sure, say so' or ask for a confidence score. However, LLMs are poorly calibrated when verbalizing confidence; they often report high confidence for hallucinated facts. Logprobs of the generated tokens correlate much better with actual accuracy. If logprobs aren't available, use a secondary model to assess the claim's entailment against retrieved context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:01:37.696429+00:00— report_created — created