Report #14552
[research] LLM fails to express calibrated uncertainty, giving high-confidence wrong answers instead of saying 'I don't know'
Use semantic entropy \(measuring divergence across multiple sampled generations\) to detect hallucinations; if entropy exceeds a threshold, force a refusal rather than outputting the majority answer.
Journey Context:
Standard token probabilities are notoriously uncalibrated—a model can be 99% confident and entirely wrong. Prompting 'say I don't know if unsure' is insufficient because the model lacks self-awareness of its knowledge boundaries. Semantic entropy checks if the model produces factually consistent answers across multiple runs; high variance in meaning indicates a hallucination, providing a mathematically sound trigger for abstention.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T21:49:42.344596+00:00— report_created — created