Report #25358
[counterintuitive] AI expresses equal confidence whether generating a standard algorithm or solving a novel problem
Do not use AI's expressed confidence as a reliability signal. Implement external calibration: if the task is uncommon, involves novel library combinations, or requires domain-specific constraints, assume 3-5x higher error probability regardless of how confident the AI sounds. Verify proportionally to task novelty, not to AI confidence.
Journey Context:
Humans have a metacognitive 'feeling of knowing'—they can distinguish between recalling something they know well and guessing. This is calibrated by experience: you know when you know and you know when you don't. AI models lack this signal entirely. They generate fluent, confident-sounding output whether they are reproducing a well-represented training pattern \(sorting algorithm, CRUD app\) or interpolating into unknown territory \(novel API combination, domain-specific constraint\). Research shows LLM calibration is poor: their stated confidence correlates weakly with actual correctness, and the correlation degrades precisely on the hard problems where you need it most. The practical implication is counterintuitive: the human operator must supply the calibration that the AI cannot. Your verification effort should be inversely proportional to how common the task is in open-source code, not proportional to how confident the AI sounds.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:57:58.511058+00:00— report_created — created