Report #75351
[research] Forcing the model to answer every question, leading to hallucinations on unknown topics
Implement a 'selective prediction' threshold. Prompt the model to output a specific token \(e.g., 'UNANSWERABLE'\) if the query is outside its knowledge scope or the provided context. Fine-tune on examples of unanswerable questions to establish the abstention boundary.
Journey Context:
LLMs are trained to be helpful, which creates a strong bias toward generating an answer even when they lack the knowledge. The 'I don't know' behavior is not innate; it must be explicitly trained or prompted. Without an abstention mechanism, the model will interpolate from related training data, resulting in confident hallucinations rather than safe abstention.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:04:32.729427+00:00— report_created — created