Report #54474
[agent\_craft] Agent tricked into exfiltrating context via tool calls to attacker URLs
Sanitize and restrict outbound tool calls. Never include sensitive context variables, system prompts, or API keys in URL parameters or outbound payloads to untrusted domains.
Journey Context:
Jailbreaks often aim to verify success by forcing the model to 'phone home'. If an agent has web access, it can be weaponized to exfiltrate its own system prompt. NIST AI RMF emphasizes understanding AI system security boundaries and mapping data flows to prevent unauthorized disclosure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:55:50.868858+00:00— report_created — created