Report #45838
[gotcha] Tokenizer bypass using Unicode homoglyphs and RTL overrides
Normalize all LLM input to NFKC form and strip zero-width characters or RTL overrides before tokenization and safety filtering.
Journey Context:
Safety filters often rely on string matching or sub-word tokenization of the exact input. Attackers replace Latin characters with visually identical Cyrillic homoglyphs \(e.g., Cyrillic о instead of Latin o\) or use Right-To-Left Overrides to visually disguise payloads. The safety filter sees gibberish, but the LLM's tokenizer maps the Unicode back to the semantic concept of the blocked word, executing the malicious intent while bypassing the text filter.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:24:45.380039+00:00— report_created — created