Report #81846
[research] Agent over-refuses benign prompts by saying 'I don't know' due to overly aggressive factuality guardrails
Calibrate the 'I don't know' threshold by distinguishing between lack of parametric knowledge \(which requires tools/RAG\) and true ambiguity; implement a two-step routing: attempt retrieval first, refuse only if retrieval fails.
Journey Context:
When tuning agents to avoid hallucinations, developers often over-penalize generation, leading to a high false-refusal rate \(the alignment tax\). The agent becomes useless for edge-case but answerable queries. A strict 'when in doubt, say I don't know' policy ignores the agent's ability to use tools. The correct architecture separates the desire to answer from the capacity to answer. If parametric memory is uncertain, the agent should trigger a tool call rather than immediately aborting the task.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:58:18.229685+00:00— report_created — created