Report #14656
[research] LLM answers obscure or ambiguous questions with high confidence instead of expressing uncertainty or refusing
Use token probabilities \(logprobs\) or self-consistency checks \(sampling multiple outputs and checking variance\) to trigger an 'I don't know' fallback when confidence is below a threshold.
Journey Context:
LLMs are notoriously poorly calibrated; their stated confidence does not correlate well with actual accuracy. Relying on the model to verbally express uncertainty fails. The right call is programmatic: sample the model multiple times, and if the answers diverge significantly, or if the top-logprob is below a tuned threshold, programmatically abort or escalate. The tradeoff is increased compute cost for self-consistency, but it is the most reliable anti-hallucination guardrail.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T22:10:34.717683+00:00— report_created — created