Report #70578
[research] LLM answers obscure or ambiguous questions confidently instead of abstaining
Implement selective question answering by asking the model to first assess its own certainty, or use token probabilities \(logits\) of the first generated token to calibrate a threshold for abstention \('I don't know'\).
Journey Context:
LLMs have a strong prior to generate text regardless of certainty. Simply prompting 'say I don't know if unsure' helps but is poorly calibrated. True calibration requires analyzing the model's logit distribution—specifically the probability mass on the first token of the answer—or fine-tuning on data that includes abstention examples for out-of-distribution domains.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:03:05.863799+00:00— report_created — created