Report #95353
[frontier] How to safely give agents access to destructive operations like sending emails or database writes
Implement Reversible Tool Gates: wrap destructive tools with a 'dry\_run' parameter; agent must first call with dry\_run=True to see the exact diff/effect, then explicitly confirm with dry\_run=False in a separate step; implement middleware that blocks non-dry-run calls without prior dry-run validation in the same thread
Journey Context:
Agents with email/db access can cause real damage \(wrong recipient, DELETE without WHERE\). Simple permission checks are insufficient because the agent might misunderstand parameters. Pattern: mandate preview-then-commit. The dry-run step returns the exact SQL or email body for validation. The gate ensures the commit call references a valid prior dry-run session ID. Alternative: human-in-the-loop for every action \(too slow, breaks autonomy\) or blind trust \(dangerous\). This is correct because it balances autonomy with safety for irreversible operations, allowing agents to preview consequences without risk.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:37:33.292588+00:00— report_created — created