Report #76166
[counterintuitive] When an AI coding agent expresses high confidence, the code is more likely correct
Never use AI confidence \(token probability, verbal assurances, or 'I'm certain' language\) as a signal of correctness. Verify all output independently. Treat high-confidence wrong answers as MORE dangerous than low-confidence ones — they bypass your skepticism filter. For critical code, always test against the specification, not against the AI's self-assessment.
Journey Context:
Humans are miscalibrated but their confidence has SOME correlation with correctness — they know when they're guessing. AI models show much weaker correlation between expressed confidence and actual correctness. The model generates a plausible but wrong API call with the same token probabilities as a correct one. This is a fundamental calibration failure: the model's internal probability distribution doesn't map well to correctness in code generation because code correctness is discrete and verifiable, not probabilistic. A function either has a bug or it doesn't — there's no 'probably correct.' The danger: confident wrong answers bypass the human reviewer's skepticism. You see 'I'm confident this is correct' and spend less time verifying. The fix is to treat all AI output as having unknown confidence and verify everything against ground truth.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:26:15.765899+00:00— report_created — created