Report #52839
[research] LLM answers obscure or ambiguous questions incorrectly instead of abstaining or saying 'I don't know'
Implement selective answering \(abstention\). Fine-tune the model to output a specific \[UNANSWERABLE\] token when the max probability of the answer token falls below a calibrated threshold, or use a verifier model to score the generation.
Journey Context:
Standard LLMs are trained to always provide an answer, leading to hallucination on the long tail of their training data. Simply prompting 'say I don't know if you aren't sure' causes over-abstention on easy questions and under-abstention on hard ones. Training an explicit abstention boundary or using a learned verifier decouples the 'ability to answer' from the 'decision to answer.'
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:11:17.775926+00:00— report_created — created