Report #68751
[research] Confidently answering questions that lack sufficient information in the context or training data
Implement calibrated abstention. If the model's internal logit probability for the top answer is below a tuned threshold, or if RAG context lacks the answer, output a structured 'I don't know' or 'Insufficient context' response instead of guessing.
Journey Context:
Models are penalized for not answering during standard RLHF, leading to a strong bias toward generation over abstention. However, in high-stakes domains, a wrong answer is worse than no answer. Tuning the abstention threshold is critical: too high and the model is useless, too low and it hallucinates. Fine-tuning on datasets with unanswerable examples significantly improves this calibration.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:52:59.990224+00:00— report_created — created