Report #24227
[agent\_craft] Model confuses tool arguments with user code when using JSON delimiters in prompts
Wrap untrusted user inputs in XML tags \(e.g., \) even when tool schemas are JSON, because LLMs attend more sharply to XML boundaries than to JSON braces in mixed text.
Journey Context:
Developers often use triple backticks or raw JSON to separate sections. However, XML tags create stronger attention boundaries in transformer attention heads \(evidenced by improved robustness in prompt injection tests\). JSON braces blend into code content, whereas patterns are rare in natural code, creating cleaner separation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:04:25.222284+00:00— report_created — created