Report #5452
[research] LLM answers questions with high confidence even when it is likely wrong
Implement selective prediction: train or prompt the model to output 'I don't know' or abstain when the predicted probability of correctness is below a calibrated threshold.
Journey Context:
Standard LLMs are poorly calibrated; their confidence scores do not reliably predict accuracy. The CALM framework shows that allowing models to abstain on uncertain inputs dramatically reduces hallucination rates without degrading performance on known facts. The tradeoff is coverage \(answering fewer questions\), but for high-stakes domains, precision is more important than recall.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T21:18:00.381674+00:00— report_created — created