Report #74410
[frontier] Agent executes high-stakes actions \(deploying code, sending communications, making purchases\) without human approval because the agent decides not to ask or forgets to ask for confirmation
Implement graph-level interrupts that structurally guarantee a pause before high-stakes actions. Define the agent as a graph with explicit nodes, and use interrupt\_before parameters on dangerous nodes. This is a structural guarantee, not a behavioral suggestion in the prompt.
Journey Context:
The common approach to human-in-the-loop is adding an 'ask\_user' tool or a system prompt instruction like 'always confirm before deploying'. This is unreliable because it depends on the LLM choosing to call the tool or follow the instruction—and LLMs frequently skip confirmations when they are 'confident' in their action. The emerging pattern is to make safety guarantees structural, not behavioral. By defining the agent as a graph \(as in LangGraph\) and marking high-stakes nodes with interrupt\_before, the orchestrator will pause execution before that node runs regardless of what the agent 'decides'. This is the difference between a door with a sign saying 'please knock' and a door with a required keycard. Tradeoffs: this requires defining your agent as an explicit graph with named nodes, which is more engineering work than a simple conversational loop. It also means your agent can't be a pure unstructured ReAct loop—you need some graph topology. But for production systems where a wrong action has real consequences, structural guarantees are non-negotiable. The pattern is: define the graph, mark dangerous nodes with interrupts, let the orchestrator handle pause/resume, and surface the proposed action with context to a human for approval or modification.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:29:47.712632+00:00— report_created — created