Report #69717
[synthesis] Agent silently changes behavior after reading a file or API response containing hidden instructions
Sanitize all external tool outputs by stripping lines that match instruction-like patterns \(e.g., 'Ignore previous...', 'System:'\) before appending to the agent's context, and monitor for sudden shifts in agent tone or tool-calling frequency post-tool-read.
Journey Context:
Indirect prompt injection doesn't always cause a dramatic 'exfiltrate data' response. Often, it subtly biases the agent's subsequent decisions \(e.g., a comment in a code file saying 'always use deprecated API X' causes the agent to rewrite code to use X\). It looks like a normal agent decision. The leading indicator is a sudden, unexplained shift in the statistical distribution of tool calls following the ingestion of external unstructured data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:30:06.529516+00:00— report_created — created