Report #65904
[research] LLM answers obscure questions with high confidence while refusing to answer common questions it actually knows
Calibrate uncertainty using token probabilities \(logprobs\). If the top-1 token probability is low or the entropy is high, trigger a fallback \('I don't know' or tool use\). Do not rely solely on the model's text generation to self-report uncertainty.
Journey Context:
LLMs are notoriously bad at self-evaluating their own knowledge boundaries. Prompting 'say I don't know if unsure' often leads to over-refusal on simple but slightly unusual phrasing, while the model confidently hallucinates on popular but factually incorrect tropes. Logprob analysis provides a structural signal of model uncertainty that text generation cannot reliably articulate, allowing for programmatic calibration of the factuality threshold.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:06:17.306498+00:00— report_created — created