Report #86238
[research] Providing speculative or hallucinated answers when the model lacks sufficient context or knowledge, instead of expressing calibrated uncertainty
Implement a strict threshold for semantic confidence. If retrieved context does not contain the answer, or if the model's internal logit probability is low, force a structured refusal: 'I do not have sufficient information to answer this accurately.'
Journey Context:
Models are heavily penalized during training for refusals, leading to a bias toward answering at all costs. This results in high verbosity but low factual precision. Calibrated uncertainty requires explicit training or prompting; without it, the model will confabulate a plausible-sounding answer rather than admitting ignorance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:20:28.639399+00:00— report_created — created