Report #55614
[research] Overconfidence and failure to say I don't know
Implement self-consistency sampling \(generate N responses, if the majority vote is below a threshold, abstain\) rather than relying on prompt-based say I don't know instructions.
Journey Context:
Prompting a model to say I don't know often destroys recall, causing it to refuse common knowledge \(false refusals\). True calibration requires statistical measures. Research shows that the variance across multiple sampled generations strongly correlates with factual uncertainty; high disagreement among samples is a robust signal to abstain.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:50:30.347544+00:00— report_created — created