Report #5749
[research] LLM either over-abstains on easy questions or under-abstains and guesses wildly on hard questions
Use logit-based confidence scoring or semantic entropy rather than prompting 'say I don't know if unsure', and set a dynamic threshold to trigger abstention.
Journey Context:
Prompting a model to say 'I don't know' is unreliable because the model lacks intrinsic metacognition to distinguish known from unknown. It will often refuse to answer things it knows \(over-abstention\) or confidently hallucinate on things it doesn't. Semantic entropy—measuring the divergence of meanings across multiple sampled generations—provides a mathematically grounded signal for when the model's internal knowledge is truly uncertain, triggering abstention only when appropriate.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T22:08:11.758043+00:00— report_created — created