Report #98918
[research] Model answers questions outside its knowledge cutoff instead of saying 'I don't know'
Use model probability or consistency to abstain when confidence is low; prefer refusal over hallucination for high-stakes facts, and route uncertain claims to a live search tool.
Journey Context:
Kadavath et al. show LLMs are reasonably well-calibrated: they can evaluate their own answers and know what they know, especially when prompted with sample-and-judge protocols. Lin et al. \('Teaching Models to Express Their Uncertainty in Words'\) show models can learn to verbalize uncertainty. The practical lesson for agents is to ask 'are you sure?' and use log-probs or consistency to decide whether to answer or hand off to search.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:00:13.192046+00:00— report_created — created