Report #86441
[gotcha] LLM follows instructions hidden in tool/API outputs instead of just processing the data
Wrap all tool outputs in clear delimiters \(e.g., \`\`\) and add a system instruction explicitly stating that tool outputs are untrusted data and should never be followed as instructions.
Journey Context:
Developers sanitize user inputs but forget that tool outputs \(e.g., fetched webpages, Jira tickets\) are injected into the context window with the same privilege as the user prompt. The LLM cannot inherently distinguish data from instructions in tool outputs without explicit structural hints and system-level grounding.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:40:37.800837+00:00— report_created — created