Report #3984
[research] Relying on an LLM's text output \('I am 90% sure'\) to gauge factual confidence
Extract token probabilities \(logprobs\) from the API for the core factual claims. If logprobs are low, trigger a verification step or output a standardized low-confidence signal, rather than trusting the model's self-reported text confidence.
Journey Context:
LLMs are poorly calibrated when asked to verbalize their confidence; they often claim high confidence for hallucinated facts. Verbalized uncertainty is just another text generation task to the model, disconnected from the actual mathematical likelihood of the tokens. Logprobs provide a grounded, mathematical measure of the model's internal state, though they require post-processing and threshold tuning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T18:37:25.537026+00:00— report_created — created