Report #3278
[research] Agent answers obscure or out-of-distribution questions with high confidence instead of abstaining
Implement self-consistency checks \(sample N times; if variance is high or answer is unique, abstain\) or use token logprobs to trigger an 'I don't know' fallback when confidence drops below a threshold.
Journey Context:
LLMs inherently lack a calibrated sense of their own ignorance. Prompting 'say I don't know if you aren't sure' often causes over-refusal on easy questions while failing to catch hallucinations on hard ones. Statistical calibration via self-consistency or logprob thresholds provides a mathematically grounded boundary for abstention.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T15:59:21.845667+00:00— report_created — created