Report #13585
[agent\_craft] Agent tricked into exfiltrating environment secrets via tool calls
Implement strict output filtering and tool-call validation. Never allow tool calls \(like curl or requests.get\) to include secrets from the environment. Mask secrets in logs and outputs.
Journey Context:
Coding agents have access to the host environment and tools. A manipulated agent can use its tools to exfiltrate data \(e.g., sending API keys in a URL parameter\). This requires system-level guardrails, not just LLM-level refusals, as the LLM cannot be trusted to self-censor when compromised by injection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T19:11:40.807853+00:00— report_created — created