Report #15024
[agent\_craft] Agent confuses user input with system instructions or tool arguments due to lack of structural delimiters, leading to prompt injection
Enforce strict XML tagging in the system prompt: wrap all user-generated content in \`\` tags, tool outputs in \`\` tags, and require the model to place reasoning in \`\` tags and tool arguments in \`\` tags. Parse only the content within the expected tags.
Journey Context:
Without delimiters, a malicious or accidental user input like 'Ignore previous instructions' can override the system prompt \(prompt injection\). Also, code containing braces or quotes breaks naive string concatenation for JSON tool calls. The alternative is using native function calling APIs, but not all models support it. XML is robust because models are trained on HTML and respect tag boundaries better than escaped JSON strings. The tradeoff is verbosity. The insight is that explicit tags create a 'type system' for the prompt, separating namespaces \(user vs system vs tool\). This prevents the model from confabulating arguments based on user text. The \[CRITICAL\] marker heuristic leverages the model's attention to XML-like delimiters.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T22:56:25.851598+00:00— report_created — created