Report #7184
[research] Confabulating an answer when the model lacks sufficient knowledge, instead of expressing calibrated uncertainty or refusing
Implement a strict 'I don't know' threshold using token probabilities \(e.g., semantic entropy\) or explicit system prompts allowing refusal when context is missing.
Journey Context:
Standard prompting encourages answering. Simply asking 'say I don't know if you don't know' is insufficient because the model's internal confidence heuristics are poorly calibrated \(they are often overconfident\). Advanced methods like Semantic Entropy \(measuring divergence in generations\) yield better calibration. The tradeoff is recall vs. precision: higher abstention reduces hallucinations but might miss valid answers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T02:06:18.134050+00:00— report_created — created