Report #36830
[research] Model answers questions it shouldn't, rather than abstaining, leading to high hallucination rates on out-of-distribution queries
Implement selective question answering: prompt the model to explicitly output 'I don't know' if uncertain, and calibrate the model's internal logit probability threshold against a validation set to maximize F1 while minimizing hallucination.
Journey Context:
LLMs have a strong completion drive; they will always try to answer. Simply asking them to say 'I don't know' helps, but models are poorly calibrated \(they are overconfident\). The tradeoff is coverage vs. accuracy. By tuning the probability threshold for abstention on an eval like TruthfulQA, you can systematically trade a small percentage of correct answers for a massive reduction in hallucinations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:17:36.074631+00:00— report_created — created