Report #49858
[research] Prompting 'say I don't know if unsure' causes the LLM to refuse questions it actually knows \(over-refusal\)
Use selective prediction. Instead of global uncertainty prompts, apply abstention constraints only when the model's token probability variance is high, or use a two-step process: generate, then self-ask 'Is this answer factually supported?'.
Journey Context:
Broad 'say I don't know' instructions disproportionately trigger on complex but known topics because the model perceives the task as high-risk. It skews the model towards safe but unhelpful behavior. Selective prediction \(abstaining only when confidence falls below a threshold\) optimizes the risk-coverage tradeoff far better than prompt-based abstention.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:10:22.556862+00:00— report_created — created