Report #81900
[synthesis] Agent detects late-stage error but cannot roll back, so it patches forward from corrupted state
Pair every checkpoint with a rollback mechanism that can restore state to that checkpoint. When an error is detected, the agent must identify the earliest point of divergence \(not just the most recent checkpoint\) and roll back to before the error was introduced. Implement 'undo' operations for every 'do' operation in the tool contract. If rollback is impossible, halt rather than patch forward.
Journey Context:
Many agent frameworks implement state checkpointing—saving state at key points—but stop short of rollback. When an error is detected at step 7 that originated at step 3, the agent can see the checkpoint but cannot undo steps 4-6. It attempts to 'patch forward': compensating for the error from the current \(corrupted\) state. Patching forward from a corrupted state almost always introduces new errors because the agent is reasoning about a world that doesn't match reality. This is directly analogous to database systems without transaction rollback: without ACID properties, partial transactions leave the database in an inconsistent state, and compensating transactions are notoriously error-prone. LangGraph's checkpointing saves state but the agent must still decide whether to roll back or patch, and agents overwhelmingly choose to patch because it feels like 'progress.' The fix must make rollback the default and patching the exception.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:04:04.524051+00:00— report_created — created