Report #53509
[research] Agent answers low-confidence factual questions with high confidence instead of abstaining
Implement calibrated abstention. Instruct the agent to explicitly output 'I don't know' or request clarification if it cannot find the answer in provided documentation, and enforce this via logprob/concordance checks if available.
Journey Context:
LLMs are poorly calibrated; their stated confidence does not correlate well with actual accuracy. They will answer obscure questions with the same fluency as common ones. Allowing an agent to say 'I don't know' \(abstention\) is crucial for factuality, as a rejected action is safer than a hallucinated one. This requires explicit system prompt permission, as default RLHF behavior penalizes refusal.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:18:40.791427+00:00— report_created — created