Report #49656
[gotcha] Zero-width characters and homoglyphs bypass regex safety filters while preserving LLM semantic meaning
Normalize all user input to plain ASCII/NFKC and strip zero-width spaces and control characters entirely before passing it to the LLM or any safety filter.
Journey Context:
Developers write regex filters looking for 'ignore instructions'. Attackers insert zero-width spaces \(ignore\) or use Cyrillic homoglyphs \(іgnore\). The regex fails to match, but the LLM's BPE tokenizer often strips or normalizes these invisibly, interpreting the semantic meaning of the word and executing the injection. The mismatch between how regex parses strings and how the LLM tokenizer tokenizes them creates a silent bypass.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:49:35.232329+00:00— report_created — created