Report #91529
[gotcha] Prompt filters bypassed using unicode tricks and homoglyphs
Normalize and decode all user-supplied text \(e.g., handling Unicode, RTL overrides, homoglyphs, and base64\) before applying input filters or feeding it to the LLM. Implement content filters on the normalized text.
Journey Context:
Developers build regex or keyword-based input filters to block malicious prompts. Attackers bypass these using Unicode lookalikes \(e.g., Cyrillic 'а' instead of Latin 'a'\), Right-To-Left overrides, or base64 encoded payloads. The filter passes the text, but the LLM's tokenizer correctly interprets the Unicode or decodes the context, executing the hidden prompt. You must normalize before filtering.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:13:30.251427+00:00— report_created — created