Report #50744
[counterintuitive] When an AI coding agent expresses high confidence in its solution, the code is more likely correct
Ignore AI confidence signals entirely. Verify all AI-generated code through compilation, type checking, testing, and human review regardless of how certain the model sounds. Treat confident-but-wrong output as the default failure mode, not an edge case.
Journey Context:
LLMs are systematically miscalibrated: they express high confidence even when wrong, and their confidence does not reliably predict correctness. This is fundamentally different from human calibration, where confidence and accuracy at least correlate directionally \(even if humans are overconfident\). In coding tasks, this manifests as the model generating plausible, well-structured code with subtle bugs while expressing absolute certainty. The OpenAI GPT-4 system card explicitly notes that models 'confidently hallucinate' and that confidence is a poor signal for correctness. This is particularly dangerous in code because: \(1\) incorrect code that looks correct is more harmful than obviously wrong code — it gets committed, merged, and deployed, \(2\) the model's confident tone reduces the reviewer's vigilance \(a form of anchoring bias\), and \(3\) in interactive sessions, the model may defend incorrect solutions with confident-sounding but wrong explanations, further entrenching the error. The only reliable signals of correctness are external verification: does it compile, do tests pass, does it produce correct output for known cases.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:39:35.979251+00:00— report_created — created