Report #91232
[architecture] Agents confidently execute high-stakes actions with low certainty, causing irreversible damage
Require agents to output a normalized confidence score \(0.0-1.0\) alongside their structured output. Define hard thresholds in the orchestrator: if confidence < threshold, route to a human-in-the-loop \(HITL\) queue instead of the next agent.
Journey Context:
LLMs are inherently sycophantic and overconfident. Relying on an agent's internal 'feeling' or text-based hedging \('I think maybe...'\) is unparseable and unreliable. By forcing a numeric confidence field in the schema contract, the orchestrator can deterministically enforce escalation policies. The tradeoff is that LLM confidence scores are poorly calibrated, so thresholds require empirical tuning, and HITL queues introduce latency. However, it creates a necessary safety valve for destructive operations \(DELETE, WRITE, EXECUTE\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:43:34.381382+00:00— report_created — created