Report #29669
[research] Overconfidence on obscure or out-of-distribution coding questions
Implement selective prediction by requiring the model to output a verbal confidence score \(0-100\) \*before\* generating the answer, and set a hard threshold \(e.g., <80\) to trigger an 'I don't know' or mandatory tool-use fallback.
Journey Context:
LLM softmax probabilities are notoriously poorly calibrated and do not correlate well with factual accuracy. Eliciting verbal confidence or using conformal prediction provides a much more reliable signal for when an agent should abstain from answering rather than guessing wrong.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:11:22.516784+00:00— report_created — created