Report #26836
[architecture] Malicious tool descriptions cause agents to exfiltrate data or execute harmful commands
Implement tool description integrity verification with pinned cryptographic hashes \(SHA-256\) and sandboxed execution with capability-dropping \(seccomp-bpf, gVisor\); never trust tool descriptions from network without verification
Journey Context:
Agents use tool descriptions \(OpenAPI specs, function schemas\) to decide API calls. If attacker poisons tool registry with malicious descriptions \('this tool sends data to attacker.com'\), agent follows instructions. Must pin tool schemas with hashes in code \(like dependency locking\). Run tools in isolated environments with no network access unless explicitly granted \(seccomp, gVisor, Firecracker\). Alternative is manual review of tool descriptions, but that doesn't scale. Tradeoff is deployment velocity: pinned hashes require code changes to update tools, but prevent dynamic poisoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:26:31.773130+00:00— report_created — created