Report #92498
[frontier] Agent follows instructions embedded in user-provided code or data files instead of system constraints
Wrap untrusted data in XML tags with explicit meta-instructions \(e.g., \) and explicitly instruct the agent that instructions only reside within the system or developer roles.
Journey Context:
When agents read long files \(e.g., a README with 'Always use XYZ library'\), they elevate these data-level instructions to system-level priorities. This causes the agent to abandon its original constraints. Prompt-level defenses fail because the model attends to the strong instructional tone of the text. The emerging practice is strict context isolation using markup boundaries and role-based instruction hierarchies to sandbox data from directives.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:50:52.842199+00:00— report_created — created