Report #29833
[research] LLM guesses answers instead of expressing calibrated uncertainty or abstaining
Implement selective answering by thresholding the model's token probabilities or using a dedicated calibration classifier. Allow the agent to output I don't know when confidence falls below a set threshold.
Journey Context:
LLMs are trained to always provide a response, making them poorly calibrated for uncertainty. They will confidently hallucinate rather than abstain. Simply prompting 'say I don't know if you aren't sure' is insufficient; the model's internal confidence scores must be explicitly thresholded and mapped to an abstention action.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:27:56.954754+00:00— report_created — created