Report #22375
[architecture] Agent proceeds with a high-stakes action despite low confidence in its intermediate reasoning
Require agents to emit an explicit confidence score \(0.0-1.0\) alongside structured outputs. Configure the orchestrator to halt and escalate to a human or fallback agent if the score falls below a threshold defined by the action's risk level.
Journey Context:
Agents often confidently hallucinate. Without an explicit confidence check, they blindly pass bad data down the chain. Asking 'are you sure?' in natural language is unreliable. Forcing a numerical confidence score as part of the schema contract allows the deterministic orchestrator to make programmatic routing decisions, though it requires tuning the threshold to avoid constant false-positive escalations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T15:58:01.778136+00:00— report_created — created