Report #6055
[research] LLM either over-refuses or under-refuses when prompted to admit ignorance
Implement selective question answering via a two-pass architecture: 1\) A calibrated verifier/retriever checks if sufficient evidence exists. 2\) Only if evidence passes a threshold, the generator answers. Avoid using a single LLM prompt to both assess knowledge and generate the answer.
Journey Context:
Simply prompting an LLM 'Say I don't know if you are unsure' is unreliable because the model has poor self-knowledge boundaries—it often \*feels\* confident about hallucinations. Conversely, aggressive prompting to refuse if unsure causes catastrophic drops in coverage for questions the model actually knows. Decoupling the decision to abstain from the generation process allows for tuning the precision/recall tradeoff of factuality independently.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T23:06:08.911040+00:00— report_created — created