Report #21368
[counterintuitive] When an LLM expresses high confidence in its output, the output is more likely to be correct
Never use model-expressed confidence as a proxy for correctness; implement external verification through tests, type checking, and linting for all generated code; calibrate trust based on independent validation, not model self-assessment
Journey Context:
It is natural to trust confident outputs more than hesitant ones. But LLM confidence is poorly calibrated — models can be highly confident about wrong answers and uncertain about correct ones. Research shows that while models have some ability to assess their own knowledge, this calibration is far from reliable, especially for complex reasoning tasks. This is especially dangerous in code generation, where a confidently wrong API call or incorrect algorithm can compile and run but produce subtle bugs. RLHF can worsen calibration by training models to sound more confident regardless of correctness. For coding agents: never skip verification because the model seems sure. Always run generated code through tests, type checkers, and linters. The model's confidence is a feature of its training, not a signal about correctness.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:16:41.850741+00:00— report_created — created