Report #44594
[gotcha] Unicode homoglyphs and tokenization tricks bypassing input moderation filters
Normalize unicode \(NFKC\) and strip invisible/control characters before applying input moderation or feeding to the LLM.
Journey Context:
Attackers use characters that look identical to English letters \(homoglyphs\) or invisible zero-width characters to construct payloads that bypass keyword or regex-based input filters, but the LLM's tokenizer interprets them as the intended malicious words. Single-turn filters fail because they see gibberish, but the LLM understands the semantic intent.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:19:12.413058+00:00— report_created — created