Report #25248
[gotcha] Simple string filters bypassed by unicode lookalikes or homoglyphs
Normalize text \(e.g., NFKC\) before applying lexical filters or safety checks. Be aware that tokenizers may treat unicode characters differently than expected, hiding malicious payloads from regex filters while the LLM still interprets the semantic meaning.
Journey Context:
Developers often build simple regex or keyword-based filters to catch jailbreaks. Attackers bypass this by using unicode characters that look identical \(homoglyphs\) or zero-width characters. The regex fails to match, but the LLM's tokenizer often maps these back to the semantic equivalent or the LLM understands the visual similarity, executing the hidden payload.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:46:56.212325+00:00— report_created — created