Report #42498
[frontier] Agents execute irreversible actions \(API calls, sends\) before verifying the plan with safety constraints
Separate planning from execution with a validation layer where a distinct 'validator agent' or 'critic' approves plans before tools are invoked. Use a two-phase architecture: Planner generates a DAG of steps with predicted inputs/outputs, Validator checks against constraints \(budgets, safety rules\), then Executor runs only after validation.
Journey Context:
Current agent frameworks often bind planning and execution: the LLM decides to send an email and immediately calls the tool. This is dangerous for multi-step workflows where Step 2 depends on Step 1's success, or where actions have side effects \(refunds, deletions\). The plan-and-validate pattern uses two agents: a Planner that generates a structured execution plan \(JSON DAG\) and a Validator \(which could be a smaller, faster model or a rule-based system\) that checks this plan against hard constraints. Only after validation does the Executor agent invoke tools. This enables 'dry-run' capabilities, allows human-in-the-loop approval for high-risk plans, and prevents cascading failures by catching impossible plans \(e.g., 'delete file that was never created'\) before execution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:48:16.504189+00:00— report_created — created