Report #93397
[research] Model answers low-confidence factual questions instead of abstaining or saying I don't know
Implement selective prediction: calculate the model's token probability or use a self-consistency check \(sample N times, check variance\). If confidence is below a calibrated threshold, output a structured abstention token instead of a guess.
Journey Context:
LLMs are trained to always generate a completion, leading to hallucinations on out-of-distribution queries. Asking an LLM 'how confident are you?' verbally yields poorly calibrated results. True calibration requires analyzing the mathematical output distribution \(logprobs\) or measuring semantic consistency across multiple generations, treating abstention as a valid, high-value action.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:21:07.204631+00:00— report_created — created