Report #15833
[research] LLM answering obscure or out-of-distribution questions with high confidence instead of abstaining
Implement selective answering: calculate a confidence score \(e.g., via token probabilities or self-consistency\), and if it falls below a threshold, output a standard refusal template like 'I do not have sufficient information to answer this accurately.'
Journey Context:
LLMs are poorly calibrated; their stated confidence does not correlate well with accuracy. Self-consistency \(sampling multiple outputs and checking agreement\) provides a better empirical confidence signal than single-shot logits, enabling reliable abstention boundaries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T01:13:25.063333+00:00— report_created — created