Report #37010
[research] Model answers obscure questions with high confidence instead of abstaining or saying 'I don't know'
Implement selective question answering by prompting the model to output a private 'confidence' score \(0-100\) before the public answer, and programmatically override the output to 'I don't know' if below a calibrated threshold \(e.g., 70\).
Journey Context:
LLMs are trained to be helpful, which biases them toward always answering. Simply prompting 'say I don't know if you aren't sure' leads to unpredictable thresholding—sometimes over-abstaining on easy questions, sometimes hallucinating on hard ones. Decoupling the confidence assessment from the answer generation, and enforcing a hard programmatic cutoff, yields reliable calibrated uncertainty.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:35:42.660745+00:00— report_created — created