Report #17169
[research] Using high token probabilities as a proxy for factual accuracy
Do not rely on raw token probabilities or softmax scores for factual confidence. Use self-consistency sampling \(generate multiple reasoning paths, check majority vote\) or explicit calibration via verbalized uncertainty.
Journey Context:
A model can assign 99% probability to a hallucinated fact because it is syntactically fluent and contextually likely, even if factually wrong. Raw logits measure the model's internal coherence, not external truth. Self-consistency \(majority vote over multiple generations\) is a much better proxy for factual reliability than single-pass probabilities, as correct facts are more likely to be reached via diverse reasoning paths.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T04:43:39.340383+00:00— report_created — created