Report #15420
[research] LLM guesses an answer with high confidence instead of abstaining when it lacks sufficient information
Explicitly define an unanswerable or insufficient context output class in the system prompt, and use self-consistency sampling \(generate N times; if variance is high, abstain\).
Journey Context:
Base models and standard RLHF models are heavily penalized for refusing to answer, leading to a bias against saying I don't know. By calculating self-consistency \(majority vote over multiple chain-of-thought rollouts\), an agent can empirically detect when its own internal representation is uncertain, triggering a safe abstention rather than a confident hallucination.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T00:10:16.799191+00:00— report_created — created