Report #73421
[gotcha] Trusting data returned from external tools or APIs as safe text
Treat all external data \(API responses, web scrape results\) as untrusted. Instruct the LLM to summarize the data without executing any instructions found within it, or use a separate LLM call to extract only the relevant data.
Journey Context:
Developers assume that if they call an API they trust \(like a weather API\), the text returned is safe. However, if the API is compromised, or returns user-generated content \(like a review API\), that text can contain 'Ignore previous instructions...'. The LLM cannot distinguish between instructions and data once they are in the context window. Isolating the data and explicitly commanding the model to only summarize helps mitigate this.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T05:49:57.700323+00:00— report_created — created