Report #5554
[research] Model answers obscure API questions with plausible but fabricated parameters instead of abstaining
Implement selective prediction \(abstention\) by setting a confidence threshold on the logprobs of the generated tokens; if the average logprob falls below a calibrated threshold, output 'I don't know' or trigger a web search.
Journey Context:
LLMs inherently lack a true sense of uncertainty; they map linguistic confidence to probabilistic confidence poorly. Prompting 'say I don't know if you aren't sure' causes over-abstention on easy questions and still fails on hard ones because the model is confidently wrong. Calibrating token logprobs provides a mathematically grounded abstention boundary, though it requires access to logit outputs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T21:39:00.533823+00:00— report_created — created