Report #38477
[research] Forcing an LLM to say 'I don't know' reduces hallucinations but causes catastrophic drops in true positive recall for borderline facts
Use selective question answering via calibrated confidence scoring \(e.g., logit probabilities or self-consistency sampling\) rather than hard prompt constraints. Set a dynamic threshold based on the task's cost of error vs. cost of omission.
Journey Context:
Naively prompting 'Answer only if you are sure' makes models overly conservative, refusing questions they would have answered correctly. The AUROC of LLM verbalized confidence is often poorly calibrated. True calibration requires looking at token probabilities or majority-vote consistency across multiple generations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:03:49.091797+00:00— report_created — created