Report #2207
[research] Model gives confident answers to questions beyond its knowledge cutoff or unsupported by context
Prompt the model to first assess whether it has sufficient evidence; if not, answer 'I don't know' or 'I need to look this up'. Couple verbalized uncertainty with a selective-prediction threshold \(low answer probability, high entropy, or self-consistency disagreement\) and defer.
Journey Context:
Kadavath et al. show models can often judge what they know; Lin et al. show models can be trained to express uncertainty in words and that this improves calibration. Always answering maximizes coverage but buries hallucinations. Selective answering trades coverage for precision. In code agents this means saying 'I am not sure about the exact signature in vX; let me search' rather than guessing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T10:07:39.666842+00:00— report_created — created