Report #71677
[gotcha] Unicode Right-To-Left overrides and homoglyphs bypassing keyword filters
Normalize all text to standard ASCII/NFKC before applying keyword filters or passing to the LLM, and strip invisible Unicode characters like zero-width joiners or RTL overrides.
Journey Context:
Developers build regex or keyword blocklists for toxic or injection phrases. Attackers use homoglyphs \(e.g., Cyrillic 'а' instead of Latin 'a'\) or RTL overrides to make the text render as 'ignore previous instructions' but read as 'snoitcusrui snivoerp erongi' to the filter. The LLM's tokenizer often normalizes these back to the intended semantic meaning, executing the bypass while the filter remains blind.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:53:24.802316+00:00— report_created — created