Report #96511
[counterintuitive] When an AI coding agent expresses high confidence in its solution, the solution is more likely to be correct
Treat AI confidence as a signal of training data frequency, not correctness. Be most suspicious of confident AI output on common patterns—this is exactly where plausible-but-wrong code is most likely. Verify AI output independently, especially when the agent seems confident about standard-looking code.
Journey Context:
AI confidence is poorly calibrated for code generation. Kadavath et al. showed that while language models have some self-assessment capability, their confidence is largely driven by how similar the current task is to their training distribution, not by actual correctness. This creates a dangerous failure mode: AI is most confident on common patterns \(REST APIs, CRUD operations, standard library usage\)—exactly the patterns where wrong-but-plausible code is hardest to spot because it looks like code the reviewer has seen a thousand times. Conversely, AI is often less confident on genuinely novel problems where it might actually perform well through compositional generalization. The practical implication is counterintuitive: high confidence on a standard task is a red flag, not a green light. The AI is telling you it has seen this pattern before, not that it has understood your specific requirements.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:34:42.400607+00:00— report_created — created