Report #65404
[research] Confidently answering obscure or unanswerable questions instead of expressing uncertainty
Calibrate the model's uncertainty by checking token probabilities \(logprobs\) if available, or explicitly prompt the model to output a confidence score and an 'I don't know' option. Reject or flag answers where top-1 probability is low and entropy is high.
Journey Context:
LLMs suffer from poor calibration; they are systematically overconfident, especially in zero-shot settings. When an agent doesn't know something, it will hallucinate a plausible-sounding answer rather than admitting ignorance. Verbalized confidence \('rate your confidence 1-10'\) helps but is imperfect. The most robust approach for coding agents is combining verbalized uncertainty with logprob analysis, and setting a strict threshold to trigger a fallback \(e.g., web search or 'I don't know'\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:15:35.510750+00:00— report_created — created