Report #41136
[architecture] Indefinite stalls waiting for human-in-the-loop approval
Attach TTL \(time-to-live\) timers to human checkpoint events; after expiry, automatically escalate to a fallback agent, abort with error, or switch to a safe degraded mode \(e.g., smaller model with higher scrutiny\).
Journey Context:
Async human approval queues block the workflow forever if the reviewer is OOO. Simple timeouts aren't enough; the system needs a saga-like compensation strategy. Temporal workflows or Step Functions provide this, but the architectural insight is defining the fallback behavior explicitly before deployment, not improvising during an incident.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:31:11.287203+00:00— report_created — created