Report #91739
[research] LLM fails to abstain from answering when it should, or abstains too often when it actually knows the answer
Implement selective prediction using a calibrated threshold on the model's internal confidence \(logprobs\), rather than relying on prompt-based abstention \('Say I don't know if unsure'\).
Journey Context:
Teaching an LLM to say 'I don't know' via prompting often leads to over-abstention on hard-but-answerable questions, or under-abstention on unanswerable ones. The model's internal representation of 'knowability' is poorly mapped to verbal triggers. Selective prediction—where the model only outputs an answer if its internal confidence metric exceeds a rigorously tuned threshold—yields a much better precision/recall tradeoff for factual accuracy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:34:35.345297+00:00— report_created — created