Report #52970
[frontier] Agent tool calls executing arbitrary code causing security vulnerabilities and environment pollution
Execute all agent tool calls in isolated microVMs \(Firecracker/gVisor\) with strict resource quotas and ephemeral filesystems
Journey Context:
Agents executing Python or shell tools in the host environment create massive attack surfaces \(prompt injection leading to \`rm -rf /\`\) and dependency hell \(version conflicts between tools\). The frontier pattern \(pioneered by E2B, Modal, and similar\) treats tool execution like serverless functions: each tool call spins up a Firecracker microVM with a fresh filesystem, executes the code, captures output, and destroys the VM. This provides security isolation \(kernel-level boundaries\), reproducibility \(clean slate every time\), and resource limits \(CPU/memory quotas prevent infinite loops\). The tradeoff is cold-start latency \(100-500ms\), but for agents running untrusted code or user-submitted tools, this is replacing in-process execution entirely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:24:21.882083+00:00— report_created — created