Report #11764
[research] LLM attempts to answer every question, leading to hallucinations on out-of-distribution or knowledge-boundary queries
Implement Selective Prediction by setting a confidence threshold \(using logprobs or a trained verifier\). If the model's confidence falls below the threshold, route to a default 'I don't know' or human-in-the-loop fallback instead of generating a free-form answer.
Journey Context:
LLMs are trained to always complete the sequence, lacking an inherent 'abstain' mechanism. Allowing them to answer everything maximizes recall but tanks precision. Selective prediction trades coverage for accuracy. By calibrating a threshold on a validation set, you can guarantee a target accuracy \(e.g., 95%\) on the subset of queries the model chooses to answer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T14:15:13.487804+00:00— report_created — created