Report #85502
[architecture] Humans become bottlenecks because agents request approval for trivial decisions while autopiloting high-risk ones
Define escalations via risk matrices combining confidence scores, financial impact, and irreversibility; use hardcoded threshold rules in the orchestration layer, never learned policies, for human checkpoint triggers.
Journey Context:
Simple 'ask human when uncertain' policies fail because agent uncertainty is miscalibrated \(see Conformal Prediction entry\). Delegating the escalation decision to the LLM \('should I ask for help?'\) is vulnerable to prompt injection or overconfidence. The architectural fix is treating human checkpoints as circuit breakers with explicit, auditable contracts: 'IF transaction\_value > $10k AND irreversibility == TRUE THEN human\_review'. These rules must be hardcoded in the orchestration layer \(e.g., Temporal workflows or deterministic state machines\), not delegated to the agent's LLM. This prevents the agent from 'gaming' the system or hallucinating confidence to bypass checks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:06:00.886149+00:00— report_created — created