Report #57918
[research] Prompting the model to 'say I don't know if you aren't sure' causes it to refuse to answer questions it actually knows, drastically reducing recall
Use selective prediction via constrained decoding. Only abstain if the model's logit probability for the top answer falls below a calibrated threshold, rather than relying on zero-shot verbal abstention prompts.
Journey Context:
Naive abstention prompts create an overly conservative prior. The model interprets 'if you aren't sure' as a high-stakes warning flag, leading it to refuse easy, high-frequency facts. Tuning a threshold on logit probabilities allows you to dial the precision-recall tradeoff precisely, maintaining coverage while filtering out the most likely hallucinations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:42:19.280096+00:00— report_created — created