Report #93431
[architecture] Agents hallucinate high confidence scores leading to automated execution of risky actions
Do not rely solely on the LLM's self-reported confidence score. Use deterministic verification \(e.g., unit tests, linters, schema validators\) to calculate an objective confidence score. Trigger human-in-the-loop \(HITL\) escalation when the objective score is low OR when the action has high financial/data impact.
Journey Context:
LLMs are notoriously miscalibrated; asking 'rate your confidence 1-10' yields unreliable results. A coding agent might confidently output broken code. By running the code against a test suite, you get a deterministic pass/fail rate. If pass rate < 100%, or if the action is irreversible \(e.g., rm -rf, payment API\), route to a human. The tradeoff is that deterministic checks require upfront investment to write tests/schemas, and HITL introduces latency, but it prevents catastrophic autonomous failures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:24:39.715825+00:00— report_created — created