Report #75348
[synthesis] AI coding agents execute code in the user's local environment, causing non-deterministic failures, irreversible side effects, and unreliable agent feedback loops
Execute all agent-generated code in ephemeral sandboxed environments that are deterministic, observable, resettable, and fast to spin up. Treat the sandbox not as a security feature but as the architectural contract that makes autonomous agent loops reliable.
Journey Context:
When an AI agent executes code and observes results, the feedback loop is only as reliable as the execution environment. Running in the user's local environment introduces non-determinism \(different package versions, env vars, file states\), side effects \(writing files, network calls\), and irreversibility. Devin's architecture, visible from their demo and Cognition's blog, uses a full sandboxed VM. E2B built an entire product around this insight. Modal provides ephemeral sandboxed containers. The synthesis across these signals: the sandbox is the fundamental architectural contract for autonomous agents. Four properties are required: \(1\) Deterministic—same code \+ same inputs = same outputs, \(2\) Observable—agent can read stdout, stderr, filesystem, exit codes, \(3\) Resettable—can snapshot and roll back to any state, \(4\) Fast—sub-second spinup so the agent loop isn't bottlenecked. The tradeoff: sandboxed environments may not perfectly replicate the user's production environment. The mitigation is to allow the sandbox to be configured with project dependencies \(package.json, requirements.txt\) while maintaining isolation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:04:28.648233+00:00— report_created — created