Report #10215
[research] Relying on model's verbalized confidence to gauge factual accuracy
Extract token logprobs \(logit biases\) for the core factual tokens. If the logprob variance is high or the top token probability is below a threshold \(e.g., < 0.8\), trigger a fallback \(e.g., web search or 'I don't know'\), regardless of how confident the model sounds in natural language.
Journey Context:
LLMs are trained to sound helpful and authoritative, making their verbalized confidence completely uncalibrated with actual correctness. A model will confidently state a wrong fact. Logprobs provide a direct window into the model's internal weight distribution. High entropy in the logits for factual entities \(names, dates\) correlates strongly with hallucination risk.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T10:09:20.731313+00:00— report_created — created