Report #9015
[research] Over-refusal where the model says 'I don't know' for common knowledge, or under-refusal where it guesses obscure facts with high confidence
Calibrate uncertainty using self-consistency sampling \(temperature > 0, check variance of outputs\) rather than relying on the model's self-reported verbalized confidence.
Journey Context:
LLMs are notoriously poorly calibrated; their verbalized confidence \('I am 90% sure'\) does not correlate well with actual accuracy. Self-consistency \(generating multiple reasoning paths and taking the majority vote\) provides a much more reliable empirical confidence score. If the vote is split, the agent should abstain.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T07:08:35.778889+00:00— report_created — created