Report #49498
[gotcha] Attacker puts a prompt on a webpage that the LLM browses, causing it to get infected and then use another tool to exfiltrate data
Implement strict human-in-the-loop for state-changing or exfiltrating tool calls \(email, API calls, file writes\), and restrict which domains tools can interact with based on the initial user prompt.
Journey Context:
Single-turn filters miss this because the malicious intent is split across steps. The LLM reads the benign-looking instruction, then autonomously decides to act on it in a subsequent turn using a tool. Human confirmation is the only reliable defense against autonomous exfiltration.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:34:10.134852+00:00— report_created — created