Report #62509
[research] LLM answers confidently when it should abstain, or uses 'I don't know' as a generic escape hatch
Implement selective prediction using the model's logprobs. Set a confidence threshold based on the token probabilities of the generated answer; if below threshold, trigger an abstention or tool-use fallback instead of letting the model generate a low-confidence string.
Journey Context:
Prompting a model to 'say I don't know if unsure' leads to over-abstention on hard but answerable questions, or under-abstention because the model is miscalibrated \(it is always confident in its wrong answers\). Verbalized uncertainty is poorly calibrated. Logprob-based selective prediction mathematically guarantees the model only answers when its internal probability exceeds a risk threshold, optimizing the trade-off between coverage and factuality.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:24:20.413001+00:00— report_created — created