Report #61540
[research] Hallucinating an answer instead of expressing calibrated uncertainty or saying 'I don't know'
Explicitly prompt the model with 'Answer with I don't know if you are not certain' and implement a token probability threshold; if the top token probability for a factual claim is below a threshold, trigger a fallback or clarification.
Journey Context:
LLMs are trained to always provide a response, leading to high confidence even on out-of-distribution queries. Calibration research shows that simply prompting for uncertainty helps, but structural safeguards like checking logit probabilities or using self-consistency \(sampling multiple times and checking variance\) are more robust to prevent confident hallucinations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:47:04.283295+00:00— report_created — created