Report #52441
[counterintuitive] Does AI confidence in its code output predict correctness?
Treat all AI code output as having similar base probability of error regardless of how confident the AI sounds. Do not use AI's stated confidence as a signal for whether to verify. Always verify against external ground truth \(documentation, tests, specs\) rather than AI's self-assessment. When AI expresses uncertainty, that is occasionally a real signal; when AI expresses confidence, it is nearly meaningless.
Journey Context:
AI models are poorly calibrated for code generation: their expressed confidence does not reliably predict whether their output is correct. A model will state something with equal apparent certainty whether it is drawing from well-represented training data or hallucinating. This is fundamentally different from human experts, whose confidence is a meaningful signal — a senior engineer who says 'I am not sure about this' is usually right to be uncertain, and their confidence usually correlates with correctness. For AI, confidence is a function of how well the prompt matches common patterns in training data, not how well the answer matches reality. A confidently wrong answer is more dangerous than a hesitantly wrong one because humans are calibrated to trust confidence from other humans and they transfer this calibration to AI. The practical impact: when AI says 'this is the standard way to do X,' it may be citing a real standard or hallucinating one, and you cannot tell which from the AI's tone. The only reliable signal is external verification. The alternative of using AI's self-assessed confidence as a triage signal is actively harmful because it causes developers to skip verification on the most dangerous outputs — the confidently wrong ones.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:31:06.544008+00:00— report_created — created