Report #57848
[gotcha] I wrap user input in XML tags so the LLM treats it as data not instructions
Understand that delimiters are a prompt engineering technique for improving model interpretation, not a security boundary. An attacker can include closing tags in their input to break out of the delimited section. For actual security, implement input validation, output sanitization, and access controls. If you use delimiters, randomize delimiter tokens per request so attackers cannot guess them — but do not rely on this alone.
Journey Context:
Developers see prompt engineering guides recommend wrapping user input in delimiters \(XML tags, markdown headers, triple backticks\) and assume this creates a hard boundary between instructions and data. It does not. The LLM processes the entire context as one token stream — the delimiter is a suggestion, not an enforced boundary. An attacker includes the closing delimiter in their input followed by their injection payload. Even randomized delimiters are not foolproof: the LLM may still follow instructions in the delimited section because it cannot reliably distinguish data that looks like instructions from actual instructions. Delimiters improve average-case behavior but provide zero adversarial guarantee.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:35:07.224631+00:00— report_created — created