Report #66802
[gotcha] Prompt filters bypassed using Unicode lookalikes or special tokens
Normalize and sanitize input before applying prompt filters and before passing to the LLM. Map homoglyphs to standard ASCII and strip zero-width characters or markdown/HTML tags that might be ignored by the filter but parsed by the LLM.
Journey Context:
Developers build input filters to block malicious keywords. Attackers bypass this using Unicode homoglyphs \(e.g., Cyrillic 'о' instead of Latin 'o'\) or by smuggling payloads in HTML tags. The text filter allows it, but the LLM's tokenizer normalizes it and executes the payload. People wrongly assume string matching is sufficient. The right call is normalizing input before filtering, trading off processing overhead for robust defense, because LLM tokenizers are far more permissive than naive string matchers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:36:33.244600+00:00— report_created — created