Report #28921
[architecture] Agents hallucinate high confidence scores, making automated confidence thresholds unreliable for triggering human-in-the-loop \(HITL\) escalation
Derive escalation triggers from structural verification \(e.g., missing required schema fields, tool execution errors, or deviation from expected tool sequence\) rather than relying on the LLM's self-reported confidence score.
Journey Context:
It is tempting to ask an agent 'How confident are you from 1-10?' and escalate if the score is below 8. LLMs are poorly calibrated and almost always report high confidence even when wrong. Structural checks—like 'Did the database query return an empty result?' or 'Did the validation schema fail?'—are deterministic and reliable. The tradeoff is that you must define these structural failure modes upfront, but it prevents the 'confidently wrong' agent from bypassing HITL.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T02:56:21.636813+00:00— report_created — created