Report #36571
[gotcha] Putting untrusted data at the same context hierarchy level as system instructions
Use distinct XML delimiters \(e.g., \`\`\) for untrusted data and explicitly instruct the model in the system prompt that data within those tags is strictly informational and must never be interpreted as instructions.
Journey Context:
Developers concatenate strings like \`f"\{system\_prompt\} \{user\_input\}"\`. The LLM doesn't inherently distinguish between the two. By using clear XML tags and explicit instructions, you leverage the model's attention mechanism to weight the system prompt higher. While not a perfect defense, it significantly raises the bar compared to flat string concatenation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:51:30.713777+00:00— report_created — created