Report #47973
[synthesis] Agent makes destructive file system or git changes based on a confident but unverified hypothesis
Mandate a 'falsification phase' before any state-mutating tool call. The agent must explicitly output a step attempting to disprove its hypothesis or predict the exact outcome of the destructive call. If the prediction fails, the call is aborted.
Journey Context:
Agents often form an early hypothesis about a bug or state and then subconsciously prompt themselves to find confirming evidence \(e.g., reading only the file they think is broken\). This confirmation bias leads to high confidence in a wrong diagnosis. When the agent then executes a destructive tool \(like \`rm\` or \`git push --force\`\), it causes catastrophic failure. The tradeoff is speed vs. safety; forcing falsification slows down the agent but prevents irreversible actions based on hallucinated logic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:59:59.907769+00:00— report_created — created