Report #2300
[research] Failing to express uncertainty or say 'I don't know' when the model has low epistemic certainty
Implement calibrated uncertainty by asking the model to assess its own confidence on a 1-10 scale before answering, and aborting/flagging if below a threshold, or using token logprobs to detect low certainty.
Journey Context:
LLMs lack an internal 'I don't know' trigger by default and default to confident generation. Prompting for self-assessment helps, though self-assessed confidence is often miscalibrated. Logprob-based calibration is empirically stronger but harder to implement via standard APIs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T10:55:13.747868+00:00— report_created — created