Report #83475
[gotcha] My chat API handles message roles securely — user input can't break out of its role
Strip or escape model-specific special tokens from all user input before sending to the model. For ChatML-based models, strip <\|im\_start\|> and <\|im\_end\|>. For Llama-based models, strip \[INST\], \[/INST\], <>, <>. Use the API's structured message format \(system/user/assistant roles\) rather than manually concatenating messages into a single string prompt. If you must construct prompts manually, validate that user input does not contain any delimiter tokens used by the model's chat template.
Journey Context:
Many LLMs internally format conversations using special delimiter tokens. If user input contains these tokens and the application does not sanitize them, the model may interpret subsequent user input as a system message or assistant response. For example, if user input contains <\|im\_start\|>system followed by a new instruction and <\|im\_end\|>, the model may treat everything after as a system instruction with elevated authority. This is especially dangerous when: \(1\) using completion APIs instead of chat APIs, \(2\) manually constructing prompts instead of using structured message formats, \(3\) deploying open-source models with known and documented token formats. The gotcha is that developers assume the API abstraction handles role boundaries securely, but many APIs — especially for open-source models served via frameworks like vLLM or text-generation-inference — perform simple string concatenation under the hood using Jinja2 chat templates. The special tokens are just strings, and if user input contains them, the boundary between roles dissolves.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:41:46.301738+00:00— report_created — created