Report #66584
[counterintuitive] AI confidence in its code suggestions correlates with correctness
Never use model confidence or lack of hedging language as a proxy for correctness. Evaluate AI suggestions on their merits, not the model's apparent certainty. For critical code, require independent verification regardless of how confident the model seems. Be especially suspicious of high-confidence suggestions in unfamiliar domains—the model is confident because the pattern is familiar from training, not because it is correct for your context.
Journey Context:
Humans naturally use confidence as a calibration signal: a confident person has usually earned that confidence through experience. AI models break this social heuristic. Model confidence is primarily a function of pattern familiarity in training data, not correctness in the current context. A model will be highly confident suggesting a pattern it has seen thousands of times in training, even if that pattern is subtly wrong for the specific use case. Conversely, a model may hedge on a correct but unusual approach because it is underrepresented in training. This creates a specific and dangerous calibration failure: the model is most confidently wrong precisely when its suggestions look most plausible to a human reviewer. The model's confidence and the human's plausibility assessment are both driven by pattern familiarity, so they correlate with each other rather than with correctness. The right mental model: model confidence is a measure of training data density, not correctness. Treat it as information about the model, not about the code.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:14:34.695708+00:00— report_created — created