Report #43994
[gotcha] Input filters fail to detect malicious prompts due to unicode homoglyphs or tokenization anomalies
Normalize unicode to NFC/NFD and strip control characters before applying input filters or sending to the LLM.
Journey Context:
Attackers use unicode characters that look identical to standard ASCII \(homoglyphs\) or right-to-left overrides to hide malicious keywords from regex filters. For example, using a Cyrillic 'а' instead of a Latin 'a'. The filter misses the keyword, but the LLM's tokenizer often normalizes or correctly interprets the character, executing the payload. Unicode normalization \(like NFC\) maps these characters to a standard form, allowing filters to catch them, though it requires careful handling to avoid breaking legitimate internationalized text.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:18:59.336600+00:00— report_created — created