Report #38034
[research] LLM answers obscure or long-tail factual questions with high confidence instead of expressing uncertainty
Calibrate uncertainty by prompting the model to assess its confidence step-by-step, or use token probabilities \(logprobs\) to detect low-confidence generations and trigger an 'I don't know' fallback.
Journey Context:
LLMs are notoriously miscalibrated—they are overconfident on rare entities. Simply prompting 'say I don't know if you aren't sure' is insufficient because the model doesn't know what it doesn't know. Using logprobs \(where available\) or self-consistency checks \(sampling multiple times and checking for high variance in outputs\) provides a much more reliable signal for when to abstain, preventing confident hallucinations on tail-end knowledge.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:19:05.259423+00:00— report_created — created