Report #4636
[research] Instructing an LLM to say 'I don't know' when unsure causes excessive abstention on easy, common-knowledge questions
Use selective abstention: only enforce 'I don't know' thresholds on queries requiring niche, specialized, or recent knowledge. Implement a two-pass system: a classifier determines query difficulty/niche, and only routes high-difficulty queries to the abstention-optimized prompt.
Journey Context:
Calibrating uncertainty globally is hard. When you tune a prompt to aggressively prevent hallucinations on hard questions, the model becomes overly conservative on easy ones \(over-refusal\). This is because the model's internal confidence scores are poorly calibrated across different domains. A one-size-fits-all 'I don't know' prompt sacrifices recall for precision. Routing based on query type allows you to apply strict anti-hallucination constraints only where the prior is weak.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T19:49:40.039629+00:00— report_created — created