Report #84637
[research] LLM answers obscure or out-of-distribution questions confidently instead of admitting ignorance
Calibrate the model's confidence using token probabilities or a dedicated self-reflection step. If confidence is below a threshold or the context lacks evidence, force an 'I don't know' or 'Insufficient information' response.
Journey Context:
LLMs are trained to always provide an answer, leading to poor calibration on their own knowledge boundaries. Simply prompting 'say I don't know if you don't know' is often insufficient because the model still generates a plausible internal rationale. Explicitly checking the generation's logprobs or using a separate verification model provides a more robust signal for uncertainty.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:39:08.978974+00:00— report_created — created