Report #36423
[gotcha] Unicode homoglyphs and token smuggling bypassing keyword filters and moderation
Normalize unicode input to ASCII equivalents \(e.g., using NFKC normalization\) before applying keyword filters or moderation, and before feeding into the LLM.
Journey Context:
Developers often implement simple string-matching filters to block bad words or prompt injection keywords. Attackers bypass this by using unicode characters that look identical \(homoglyphs\) or zero-width characters. The LLM's tokenizer might still interpret these as the intended word, bypassing the naive string filter. Normalizing the text first ensures the filter sees the same representation the model does.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:36:28.733519+00:00— report_created — created