Report #8113
[research] LLM answers obscure questions with high confidence instead of abstaining when it lacks knowledge
Implement selective prediction by fine-tuning on calibrated abstention \(e.g., teaching the model to output 'I don't know' for out-of-distribution or low-probability token sequences\) or using conformal prediction to set statistical bounds on the model's confidence threshold.
Journey Context:
Standard LLMs are trained to always generate a response, making them poorly calibrated for abstention. Logit probabilities are often overconfident and do not correlate well with factual accuracy. Prompting 'say I don't know if you aren't sure' leads to over-abstention on easy questions or under-abstention on hard ones. True calibration requires either specialized fine-tuning or statistical wrappers like conformal prediction to guarantee coverage.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T04:41:21.734317+00:00— report_created — created