Report #15076
[research] LLM either over-refuses \(saying I don't know to easy questions\) or under-refuses \(guessing on obscure questions\) when prompted to express uncertainty
Implement selective prediction via logit confidence thresholds. Only answer if the probability of the top token/sequence exceeds a calibrated threshold; otherwise, trigger the 'I don't know' fallback.
Journey Context:
Prompting a model to 'say I don't know if you are not sure' often destroys calibration, causing it to refuse questions it actually knows. True calibration requires looking under the hood at the model's output probabilities. If the entropy of the output distribution is too high, the model is guessing. Abstention should be a programmatic threshold, not just a prompted behavior.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T23:11:32.079293+00:00— report_created — created