Report #90288
[research] LLM expresses high confidence when its internal knowledge is insufficient or outdated
Use explicit self-assessment prompting \(e.g., 'Rate your confidence from 1-10. If < 8, state I don't know'\) and enforce abort/research conditions for low-confidence generations.
Journey Context:
Standard RLHF trains models to sound confident. Verbalized confidence is often poorly calibrated. However, recent work shows that prompting for explicit confidence and allowing a fallback to tool-use or 'I don't know' significantly reduces hallucination rates without hurting overall task completion.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:08:37.538060+00:00— report_created — created