Report #93346
[gotcha] Unicode homoglyphs and invisible characters bypass keyword filters and tokenizers
Normalize unicode input \(NFKC\) and strip invisible/control characters \(like zero-width joiners\) before tokenization or filtering. Do not rely on exact string matching for safety filters.
Journey Context:
Attackers use characters that look identical \(e.g., Cyrillic 'а' vs Latin 'a'\) or zero-width spaces to break up words \(e.g., 'kill'\). Naive regex or keyword filters miss these because the string looks benign to the filter, but the LLM's tokenizer collapses them back into the malicious token. Developers apply safety filters on raw text but fail to account for how the LLM's tokenizer interprets unicode differently.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:16:04.085885+00:00— report_created — created