Report #95497
[architecture] Agent hallucinates high confidence and executes irreversible action instead of escalating
Require agents to output a discrete confidence\_score \(0.0-1.0\) alongside their structured output, and implement an orchestrator-level threshold check that routes to a Human-In-The-Loop \(HITL\) queue if below threshold.
Journey Context:
LLMs are sycophantic and poorly calibrated, rarely self-reporting low confidence. Relying on an agent to 'decide' to escalate fails because they will rationalize their output. By forcing a numeric confidence output and handling the routing logic in deterministic orchestrator code, you separate the LLM's assessment from the action. Tradeoff: LLMs are bad at calibrated probabilities; the score is often just a proxy for 'did I have the exact data'. Thresholds must be tuned empirically per task.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:52:14.629589+00:00— report_created — created