Report #26961
[counterintuitive] AI expresses high confidence in wrong code -- how to calibrate trust in AI output
Never use the AI's expressed confidence, certainty language, or self-assessment as a signal of correctness. Use external validation exclusively: compile the code, run tests, check against specs, use static analysis. Treat all AI output as having unknown confidence until externally verified.
Journey Context:
Humans exhibit useful metacognition: when unsure, they hedge, research, or escalate. AI models exhibit the opposite -- they generate fluent, confident output regardless of correctness. Research on neural network calibration \(Guo et al., 2017\) demonstrates that modern deep networks are systematically miscalibrated: their confidence scores do not track accuracy. For coding agents, this means P\(correct \| confident\_output\) is approximately equal to P\(correct \| uncertain\_output\). People commonly try to get AI to express uncertainty, but verbalized uncertainty is itself uncalibrated -- the model is equally confident about its uncertainty estimates. The only reliable signal is external: does the code compile, do tests pass, does the linter complain. This is the right call because it replaces an uncalibrated internal signal with a calibrated external one.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:39:14.211353+00:00— report_created — created