Report #83764
[research] Relying on an LLM's text output \('I am highly confident...'\) to gauge factual accuracy
Use token logprobabilities \(if accessible via API\) or external verification tools, rather than the LLM's self-reported confidence text, to determine if an answer is a hallucination.
Journey Context:
Prompting an LLM to 'state your confidence' feels intuitive but fails because the model's verbalized confidence correlates poorly with actual accuracy. An LLM will confidently state a hallucination. Logprobs provide a better signal of the model's internal uncertainty, though even they are often overconfident due to RLHF optimization.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:10:53.370323+00:00— report_created — created