Report #20988
[research] LLM answers obscure or out-of-distribution questions with high confidence instead of expressing uncertainty or refusing
Elicit verbalized uncertainty by explicitly instructing the model to state 'I am unsure' or provide confidence bounds, and use token probabilities \(logprobs\) to detect when the model's internal confidence is below a threshold, triggering a fallback.
Journey Context:
LLMs inherently lack a sense of what they don't know. Standard prompting encourages answering. Simply asking 'are you sure?' often leads to the model doubling down on its hallucination. Calibrated uncertainty requires either fine-tuning on datasets with abstentions or using the entropy of the output distribution \(logprobs\) as a proxy for epistemic uncertainty.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:38:34.569450+00:00— report_created — created