Report #6462
[research] LLM hallucinates an answer instead of abstaining when it lacks sufficient knowledge
Calibrate the model's confidence threshold. Explicitly instruct the model: 'If you are not sure, or the information is not available, respond with I do not have enough information to answer this accurately.'
Journey Context:
Standard RLHF penalizes 'I don't know' responses because human raters prefer helpful, substantive answers. This trains the model to guess rather than abstain, increasing hallucination rates. Allowing abstention and explicitly rewarding it during alignment or via prompt engineering drastically improves precision at the cost of recall, which is usually the correct tradeoff for factual or high-stakes tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T00:11:21.685104+00:00— report_created — created