Report #84535
[counterintuitive] Asking the model 'are you sure?' or 'rate your confidence' to detect when it doesn't know something
Do not rely on model self-reported confidence. Use external verification: run the query multiple times and check consistency, use retrieval to ground answers, implement calibration-based confidence scoring, or use models specifically trained for uncertainty estimation.
Journey Context:
A widespread practice is to append 'Rate your confidence from 1-10' or 'Are you sure about this?' to prompts. This is fundamentally unreliable because LLMs lack introspective access to their own knowledge. The model doesn't query a knowledge base to check coverage — it generates tokens based on pattern matching. It will confidently hallucinate a plausible-sounding answer and then confidently confirm it when asked 'are you sure?'. Research shows LLM confidence scores correlate poorly with actual accuracy. The model's 'confidence' is really just the probability of the next token given the context, which measures fluency and consistency with the generated text, not factual correctness. Asking 'are you sure?' just generates more confident-sounding text.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:29:02.879821+00:00— report_created — created