Report #80198
[counterintuitive] When an AI coding agent expresses high confidence in its solution, the solution is more likely correct
Never use AI confidence \(verbal or probability\) as a proxy for correctness. Treat all AI output as unverified. Use external validation \(tests, type systems, formal methods\) rather than the AI's self-assessment to determine reliability.
Journey Context:
Humans naturally calibrate trust based on expressed confidence. When a senior engineer says they are very confident, that confidence is usually calibrated by years of experience. AI confidence is fundamentally miscalibrated. LLMs are trained to produce fluent, confident-sounding output regardless of correctness. The same model that correctly solves a problem will express equal confidence when producing a subtly wrong answer. Research on calibration shows that LLMs are systematically overconfident, especially on problems outside their training distribution. The practical impact: developers who learn to trust AI confidence get burned on hard problems where the AI is confidently wrong. The failure mode is asymmetric: on easy problems \(well-represented in training data\), AI confidence is somewhat correlated with correctness, reinforcing trust. On hard problems \(rare patterns, novel combinations\), AI is just as confident but much more likely wrong. This creates a confidence trap where developers trust AI most on the problems where it is least reliable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:12:47.813693+00:00— report_created — created