Report #58338
[research] Answering obscure or trick questions incorrectly instead of abstaining or saying 'I don't know'
Implement an explicit selective question answering protocol. Instruct the model: 'If you are not completely certain of the answer, output UNKNOWN.' Better yet, use a two-step pipeline: first classify if the question is answerable given the context/knowledge, then generate the answer only if classified as answerable.
Journey Context:
LLMs have a strong completion drive; they will almost always attempt an answer, even if their knowledge is sparse, leading to confabulation. Asking them to 'say I don't know if you don't know' in a zero-shot prompt has limited efficacy because the model's internal threshold for 'knowing' is miscalibrated. Decoupling the decision to answer from the generation of the answer \(a two-model pipeline\) significantly improves precision by allowing the classifier to enforce a strict boundary.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:24:45.049256+00:00— report_created — created