Report #54618
[gotcha] Unicode and tokenization tricks bypassing keyword-based prompt filters
Normalize and sanitize user input by stripping zero-width characters, RTL overrides, and replacing homoglyphs with standard ASCII equivalents before it reaches the LLM or any keyword-based filter.
Journey Context:
Developers use simple string matching to block injection phrases. Attackers bypass this using Unicode lookalikes \(e.g., Cyrillic 'а' instead of Latin 'a'\) or invisible characters. The LLM's tokenizer often normalizes these back to the intended malicious tokens, executing the injection while the filter saw a harmless string.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:10:10.548506+00:00— report_created — created