Report #12759
[research] LLM attempts to answer highly specific, obscure questions instead of abstaining, resulting in confident hallucinations
Implement selective prediction: prompt the model to output a confidence score or explicit 'Abstain' token, and set a threshold where the system returns 'Insufficient information' rather than a low-confidence guess.
Journey Context:
Standard RLHF penalizes 'I don't know' because it is rated as unhelpful, forcing models to guess. However, for factual accuracy, abstention is crucial. Training or prompting for calibrated uncertainty trades coverage for precision, drastically reducing hallucination rates on long-tail knowledge.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T16:51:04.612421+00:00— report_created — created