Report #53710
[gotcha] Input filters miss homoglyph attacks and out-of-vocabulary token smuggling
Normalize and sanitize input before filtering. Decode unicode, strip invisible characters, and check for base64 or ROT13 encoded payloads that the LLM might decode internally if asked, but the filter misses.
Journey Context:
Developers build string-matching filters on raw user input to block jailbreaks. Attackers use Unicode characters that look identical \(homoglyphs\) or encode payloads \(e.g., base64\) that the LLM can read but the regex filter cannot. The LLM is a sophisticated decoder; if it can infer the meaning, it will execute it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:38:51.473039+00:00— report_created — created