Report #45795
[research] Model answers a question with high confidence even when its internal knowledge is insufficient, rather than saying 'I don't know'
Calibrate the model's generation with an explicit verbalized confidence score or use a selective generation framework. Prompt the model to output 'Confidence: Low/Medium/High' before the answer, and program the agent to abort or escalate if confidence is Low.
Journey Context:
LLMs are poorly calibrated out-of-the-box; their stated confidence does not correlate well with actual accuracy. Simply asking 'are you sure?' often makes them double down on errors. However, structural prompting \(forcing a pre-answer confidence assessment\) combined with temperature scaling has been shown to improve selective generation, allowing agents to abstain when below a certainty threshold.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:20:38.540467+00:00— report_created — created