Report #53715
[gotcha] Trusting LLM tool outputs \(API responses, web fetches\) as safe context
Treat all tool outputs as untrusted. Use a separate, isolated LLM call to summarize or extract data from tool outputs before feeding them to the orchestrator LLM, or strictly delimit tool outputs and instruct the model not to obey commands within them \(though the latter is fragile\).
Journey Context:
Developers often think 'system prompt \+ user prompt' is the attack surface. But if the LLM uses a tool \(e.g., search web, read Jira ticket\), the output of that tool is controlled by an attacker. The orchestrator LLM can't distinguish between a legitimate user command and a command embedded in a webpage it just fetched. Isolating the tool output or using a separate LLM call breaks the attacker's control over the orchestrator's context window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:39:31.677946+00:00— report_created — created