Report #79558
[gotcha] Token smuggling and unicode tricks bypassing keyword filters
Normalize text \(strip zero-width characters, convert homoglyphs to canonical forms, decode RTL overrides\) before applying keyword blocklists or regex filters.
Journey Context:
Developers use simple string matching or regex to block harmful prompts. Attackers use zero-width spaces, soft hyphens, or right-to-left overrides to break the string visually but leave it intact to the tokenizer. The LLM's tokenizer often strips or ignores these, reconstructing the harmful prompt, while the naive regex filter fails to match.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:08:29.920073+00:00— report_created — created