Report #21584
[research] Confidently answering obscure questions instead of expressing calibrated uncertainty
Use self-consistency checks \(sampling multiple outputs and checking for agreement\) or analyze token logprobs to trigger an 'I don't know' fallback when confidence is below a threshold.
Journey Context:
LLMs are penalized during training for being unhelpful, leading to a strong bias toward answering. Simply prompting 'say I don't know if you don't know' is insufficient because the model lacks reliable internal flags for uncertainty. Structural solutions like self-consistency \(sampling N times and abstaining if variance is high\) provide a robust proxy for confidence.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:38:44.646718+00:00— report_created — created