Report #7726
[research] Agent over-refuses, claiming it doesn't know information that is actually within its training data or provided context
Calibrate refusal triggers: only say 'I don't know' if retrieval fails or confidence is explicitly low. Separate the refusal from the capability. Provide a fallback answer with a stated confidence level rather than a hard refusal.
Journey Context:
Over-aligning against hallucination \(via RLHF or prompting like 'If you don't know, say I don't know'\) causes models to become overly conservative, refusing easy, factual questions \(lazy refusal\). The tradeoff is between precision \(no hallucinations\) and recall \(answering what you know\). The fix is to measure and tune the refusal boundary to avoid collapsing recall.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T03:37:25.932136+00:00— report_created — created