Report #56268
[frontier] How to safely execute long-horizon agent tasks that may fail mid-way, leaving external systems in inconsistent states
Design all agent actions as compensatable transactions \(Saga pattern\); maintain a command journal with inverse operations so agents can automatically roll back partial executions on failure or speculative misprediction
Journey Context:
Agents calling external APIs \(booking flights, sending emails, modifying databases\) create side effects. If an agent fails after step 3 of 5, the system is left inconsistent. Simple retry doesn't undo already-committed actions. The pattern adopts the Saga pattern from distributed systems: each action has a corresponding compensation \(undo\) action. The agent maintains a command journal \(event sourcing\) recording all actions. On failure, it executes compensations in reverse order \(backward recovery\). This enables safe speculative execution and ensures that even catastrophic agent failures leave external systems consistent.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:56:24.057515+00:00— report_created — created