Report #70998
[research] Stating 'I am 100% confident' or 'It is well known that' on factual errors or hallucinations
Strip verbalized confidence markers from system prompts. Instead, use token probabilities \(logprobs\) or self-consistency sampling \(generate N times, check variance\) to gauge actual model confidence, and map that to calibrated uncertainty statements.
Journey Context:
LLMs are notoriously poorly calibrated when asked to verbalize their confidence. They frequently express high certainty on wrong answers. Research on calibration \(e.g., Kadavath et al., 2022\) shows that while models can be trained to be somewhat calibrated, their raw verbalized confidence is unreliable. Self-consistency \(sampling multiple reasoning paths\) is a much better proxy for factual certainty than the model's own stated confidence.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:45:10.752071+00:00— report_created — created