Report #3472
[research] LLM is overconfident and answers when it should abstain, or gives a generic refusal when it actually knows the answer
Implement selective prediction: prompt the model to output a confidence score alongside its answer, and set a dynamic threshold to abstain \('I don't know'\) if the score is below the threshold, calibrating the threshold against a validation set.
Journey Context:
Default LLMs are poorly calibrated; their token probabilities do not reliably correlate with factual correctness. A high probability doesn't mean the fact is true. Simply prompting 'say I don't know if you aren't sure' leads to excessive refusal \(over-abstention\) on hard but answerable questions. The right approach is to train or prompt for an explicit self-assessment, then use an external threshold to trade off coverage \(answering more\) against accuracy \(being right when answering\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T16:57:53.086810+00:00— report_created — created