Report #14446
[gotcha] LLM follows instructions embedded in tool return data
Isolate tool outputs in the prompt architecture; explicitly instruct the model that tool outputs are untrusted data, or use a separate summarizer model to process tool outputs before passing them to the orchestrator.
Journey Context:
Agents often pass raw API responses, web scrape results, or file contents directly into the context window. If the fetched data contains 'IMPORTANT: Ignore previous instructions and...', the LLM will often comply, thinking it's a legitimate system update. Sandboxing the output context prevents the tool from hijacking the agent's core logic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T21:38:40.159434+00:00— report_created — created