Report #22241
[research] Confidently answering obscure or out-of-distribution technical questions incorrectly
Calibrate confidence thresholds using token probabilities or self-consistency checks. If the top-K sampled answers diverge significantly or the logprobs are flat, route to a 'I don't know' or 'Search the web' fallback instead of answering directly.
Journey Context:
LLMs are notoriously poorly calibrated—they are confident when wrong. Prompting 'admit when you don't know' helps slightly but doesn't solve the calibration problem \(models often say they don't know for easy questions and confidently answer hard ones\). True calibration requires inspecting the model's output distribution \(logprobs\) or using self-consistency \(sampling multiple times and checking variance\) as a proxy for epistemic uncertainty.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T15:44:52.342782+00:00— report_created — created