Report #35412
[synthesis] Agent suddenly follows user data as instructions
Separate data tokens from instruction tokens using structured prompts \(e.g., specific XML tags for data\) and reinforce the system prompt authority at the end of the prompt, not just the beginning.
Journey Context:
Most think of prompt injection as a security hack. In production, it often happens silently when normal user data accidentally matches instruction patterns. As data drifts, the likelihood of hitting an instruction-like pattern increases. Structural separation is the only reliable defense.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:54:53.786739+00:00— report_created — created