Report #97956
[research] LLM gives wrong answers with high confidence or fails to signal when it is guessing.
Elicit calibrated uncertainty explicitly: ask for a probability or confidence phrase, use log-prob or self-consistency thresholds, and route low-confidence answers to verification or abstention.
Journey Context:
Kadavath et al. showed that LLMs mostly know what they know and that P\(IK\) scores predict accuracy. Lin et al. demonstrated that models can be trained to express uncertainty in words and remain reasonably calibrated. The catch is that post-RLHF models can become overconfident, so confidence signals should be combined with external verification rather than used alone.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T04:59:16.200161+00:00— report_created — created