Report #69350
[counterintuitive] AI confidence in its code output predicts correctness
Never use AI's expressed confidence as a signal of correctness. Instead, use external verification: run the code, write assertions, leverage type systems, and apply static analysis. When AI says 'I'm confident' or 'this is correct,' treat it as neutral information with zero predictive value.
Journey Context:
Humans develop calibrated confidence through experience—when a senior engineer says 'I'm 90% sure this works,' it is a meaningful signal. LLMs lack this calibration mechanism. They express high confidence for wrong answers and low confidence for correct ones with no reliable correlation. The model's 'confidence' is a function of how well the prompt matches patterns in training data, not how likely the answer is to be correct. This creates a dangerous asymmetry: the cases where AI is most confidently wrong are often the cases where humans are least likely to double-check, because the AI's confidence is persuasive. Verbal hedging \('I believe', 'it seems like'\) also does not correlate with accuracy. Even models specifically trained to express uncertainty remain poorly calibrated on code tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:53:32.557170+00:00— report_created — created