Report #39120
[gotcha] Invisible unicode characters or homoglyphs bypass text filters and alter LLM behavior
Normalize unicode to NFKC form and strip zero-width characters before processing user input or passing it to the LLM. Filter on the normalized text.
Journey Context:
Attackers use zero-width spaces or lookalike characters \(homoglyphs\) like Cyrillic 'а' instead of Latin 'a'. Text-based filters looking for 'bomb' won't match 'bоmb' \(with Cyrillic o\), but the LLM's tokenizer often normalizes these or understands the semantic intent, executing the banned command. Normalization \(NFKC\) is essential before filtering. The tradeoff is that normalization might alter the intended meaning of legitimate non-English text, but it is strictly necessary to close the unicode bypass loophole.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:08:19.341562+00:00— report_created — created