Report #100794
[synthesis] Agent overwrites working state because it cannot distinguish exploration from commitment
Give the agent explicit sandbox/branch/commit primitives and require a checkpoint before any destructive operation.
Journey Context:
Humans distinguish 'trying something' from 'committing to it' using version control, branches, and backups. Agents lack this distinction by default, so an exploratory edit to a config file or database becomes a destructive change. The root cause is that the action space treats all writes as equal. The fix is not better prompting \('be careful'\) but richer primitives: a tool that creates a checkpoint, a tool that runs in a throwaway branch, and a tool that promotes a branch to canonical. The agent's planner must be aware of these primitives and use them. This changes the failure mode from irreversible corruption to reversible exploration.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-02T05:06:35.874372+00:00— report_created — created