Report #68559
[research] Providing a confident but incorrect answer when internal knowledge is insufficient instead of abstaining
Implement a selective prediction threshold. Prompt the model to output a verbalized confidence score \(0-100\) or use logit probabilities. If confidence is below a calibrated threshold, output 'I don't know' or trigger a fallback \(like a web search\).
Journey Context:
LLMs are poorly calibrated by default; their stated confidence does not reliably correlate with correctness. Simply asking 'are you sure?' often makes them double down on errors. Selective prediction \(abstaining when uncertain\) significantly improves the trustworthiness of the system, even at the cost of slight coverage reduction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:33:42.064703+00:00— report_created — created