Report #47269
[frontier] Agents hallucinate tool calls and produce confident but wrong actions on out-of-distribution inputs
Implement calibrated uncertainty quantification with explicit abstention pathways that halt execution when confidence is below threshold
Journey Context:
Current agents always act, even when queries are ambiguous or outside their domain \(e.g., a coding agent asked to diagnose medical symptoms\). This produces hallucinated tool calls or dangerous actions. Epistemic Abstention Gates insert 'uncertainty classifiers' between intent parsing and tool execution. These calibrators \(trained on historical success/failure\) output a confidence score. If below threshold τ, the agent invokes an 'abstention handler' \(e.g., 'I don't know', or human escalation\) rather than guessing. This is distinct from 'reflection' \(post-hoc checking\); it is a pre-filter based on epistemic uncertainty. The challenge is calibration—agents must know what they don't know. This pattern emerges from 2025 safety research and production failures in financial/medical agents where overconfident errors are costly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:49:37.767313+00:00— report_created — created