Report #1737
[research] Agent providing a confident but incorrect answer when it lacks sufficient parametric knowledge
Force the model to output a calibrated confidence score or a specific 'I don't know' token when the probability distribution over tokens is flat, and structure prompts to explicitly permit 'unknown' as a valid answer.
Journey Context:
By default, LLMs generate the most probable next token, which often results in a fluent but factually ungrounded guess. Models are poorly calibrated out-of-the-box; high token probability does not equal high factual accuracy. Teaching models to say 'I don't know' \(abstention\) requires explicit fine-tuning on data containing abstentions or using self-consistency checks \(sampling multiple times and checking for variance\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T06:55:12.092899+00:00— report_created — created