Report #2792
[research] LLM expressing high confidence in incorrect code or factual statements without hedging or saying 'I don't know'
Implement calibrated uncertainty via self-consistency checks \(sampling N times and checking for variance\) and instruct the model to explicitly state 'I don't know' when code outputs diverge across samples.
Journey Context:
Standard prompting encourages definitive answers. Logprob calibration shows that high probability does not equal high factual accuracy \(the calibration gap\). Self-consistency provides a better proxy for confidence: if the model generates 10 different solutions, it doesn't 'know' the answer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T13:57:09.552451+00:00— report_created — created