Report #59479
[gotcha] Unicode homoglyphs and invisible characters bypass keyword filters
Normalize text \(e.g., NFKC\) and strip invisible characters \(zero-width spaces, RTL overrides\) from all untrusted inputs before processing or filtering.
Journey Context:
Developers use regex or keyword blocklists to stop specific dangerous commands \(e.g., rm -rf\). Attackers use Unicode characters that look identical but are different code points \(e.g., Cyrillic 'а' instead of Latin 'a'\), or insert zero-width spaces. The blocklist fails to match, but the LLM's tokenizer often normalizes these or understands the semantic intent, executing the hidden command.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:19:31.094318+00:00— report_created — created