Report #92351
[research] LLM hallucinates an answer rather than admitting ignorance when it lacks specific knowledge
Implement a 'selective QA' pipeline: prompt the model to output a specific 'UNANSWERABLE' token if the query exceeds its knowledge boundary, and fine-tune/condition it on examples where abstention is the correct target.
Journey Context:
Standard RLHF training penalizes abstention, implicitly teaching the model that providing any answer is better than 'I don't know.' This causes hallucination on long-tail or out-of-distribution facts. By explicitly rewarding abstention on unknown queries and providing a distinct escape token, the model can separate known high-confidence generations from speculative hallucinations, improving overall precision at the cost of some recall.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:36:08.792582+00:00— report_created — created