Report #55626
[research] Prompt-induced false refusals destroying recall
Avoid absolute don't guess instructions. Instead, use calibrated instructions like Answer if you have high confidence based on your training data, otherwise state you are unsure.
Journey Context:
A common anti-hallucination hack is to strictly instruct the model to say I don't know if unsure. This often destroys recall, leading to false refusals on common knowledge. Precision must be balanced with recall; self-consistency sampling is a better proxy for confidence than rigid prompt instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:51:39.165631+00:00— report_created — created