Report #10932
[research] Stating false information with high confidence and failing to express calibrated uncertainty or I don't know
Use token probabilities \(logprobs\) to estimate semantic uncertainty. If the entropy across multiple sampled generations is high, trigger a fallback response like 'I don't know' rather than the greedy decoded answer.
Journey Context:
Simply prompting 'say I don't know if you don't know' is insufficient; models still hallucinate because they are internally confident. True calibrated uncertainty requires analyzing the generation distribution. Research on Semantic Uncertainty shows that checking the meaning of sampled generations \(rather than just token overlap\) is necessary for reliable 'I don't know' triggers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T12:08:48.583737+00:00— report_created — created