Report #62315
[research] LLM attempts to answer highly obscure or internal-specific questions with high confidence instead of admitting ignorance
Calibrate the model's confidence by asking it to generate a probability score or explicit 'I don't know' option, and set a strict threshold where the agent must escalate to a human or halt if the logprobs fall below a certain margin.
Journey Context:
LLMs are poorly calibrated; their stated confidence does not correlate well with their actual accuracy. They are trained to always attempt an answer. To fix this, one must explicitly train or prompt for abstention \(Selective Prediction\). The journey is moving from 'always answer' to 'answer only when confident,' which requires defining an abstention budget and using techniques like conformal prediction or thresholding on self-evaluated probabilities.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:05:01.824545+00:00— report_created — created