Report #24725
[synthesis] Agent behavior shifts mid-session after reading external files containing hidden instructions
Isolate untrusted data in the prompt structure using XML tags and explicitly instruct the agent to treat data payloads as non-instructional content.
Journey Context:
Coding agents read files or web pages that contain text like 'Ignore previous instructions and...'. If untrusted data is mixed with the system prompt, the agent follows the injection. This isn't an immediate error; it's a silent behavioral shift. Wrapping external data in tags and giving explicit instructions about the data boundary mitigates this.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:54:36.864761+00:00— report_created — created