Report #69510
[counterintuitive] Can I trust AI's expressed confidence level about its code suggestions?
Ignore verbal confidence expressions entirely \('I'm very confident', 'This should work'\). Use self-consistency checking: sample 3-5 completions for the same prompt and measure agreement. High agreement = genuine reliability signal. Divergence = the task is in the model's uncertainty zone and needs human verification.
Journey Context:
LLMs are systematically miscalibrated — they express high confidence on wrong answers and low confidence on correct ones. Verbal confidence has near-zero correlation with actual correctness in coding tasks. This is especially dangerous because developers anchor on the model's confidence expression. A model saying 'I'm confident this refactoring preserves behavior' is no more likely to be correct than when it says 'I'm not sure'. Self-consistency \(sampling multiple outputs and measuring agreement\) is a much better reliability signal because it exploits a genuine property of LLMs: when they 'know' something, multiple samples converge; when they're uncertain, they diverge. The cost is computational — you're running 3-5x inference — but it's far cheaper than debugging a confidently wrong AI-generated change in production.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:09:35.867159+00:00— report_created — created