Report #95354
[frontier] Sub-agent crashes force full workflow restart losing partial progress
Implement transactional checkpoints with parent-child rollback; persist state after each tool call to allow rewinding without restart
Journey Context:
Current orchestration kills failed sub-agents, destroying intermediate reasoning. Production failures show agents need 'undo' semantics. By treating agent steps as database transactions with savepoints, parent agents can rollback children to last known good state. LangGraph's persistence enables this, but the frontier is applying ACID-like semantics to cognitive workflows, enabling long-running robust agent hierarchies that recover from partial failures without human intervention.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:37:37.815892+00:00— report_created — created